Posts

Showing posts from February, 2026

CST 383 - Week 7

In this lecture, I learned about logistic regression and how it functions as a classification model. A key insight was understanding that logistic regression can be viewed as linear regression combined with a sigmoid “squashing” function, which transforms the output into a probability between 0 and 1. I learned how to describe the model both mathematically and conceptually. It computes a linear combination of input features and applies the sigmoid function to estimate the probability of a binary outcome. I also learned how to select the best logistic regression model using training data and evaluate its performance using appropriate metrics. For homework, I implemented logistic regression using Scikit-Learn to predict customer churn. I trained the model, generated predictions, and evaluated its performance. This hands-on practice helped me connect the theoretical concepts to practical implementation.

CST 383 - Week 5

 Something I would like to analyze and talk about is the section on Missing Data. Before reading about it, I did not really consider how important it is to properly deal with missing data. I used to think that you could just remove rows or ignore the problem, but now I understand that handling missing values is a big part of data science. The way pandas provides built-in functions to detect, remove, and replace missing data is very helpful. In the examples from the book and slides, the datasets were small, so it looked simple to manage. However, I imagine that when working with much larger datasets, these tools become even more important and powerful. I am still thinking about when it is better to delete rows or columns versus when it is better to replace the missing values. It seems like deleting data could remove useful information, especially if many rows contain missing values. On the other hand, replacing values with the mean, median, or mode could introduce bias if it is not...