Muddy Points

Lesson 7: Prediction and Visualization in Simple Logistic Regression

Modified

April 18, 2025

Muddy Points from Spring 2025

1. `geom_ribbon()` to add bands: is that just to view the range of values and or confidence interval?

geom_ribbon() can be used to add any range of values. In this case, we used geom_ribbon() with our calculated confidence intervals, so it is showing the confidence intervals. It will not automatically do this.

2. I had trouble grasping what it means that the predicted probability is not the same as the predicted outcome? Is it because the outcome is calculated differently?

Our outcome, \(Y\), is a binary value. We can have \(Y=1\) or \(Y=0\). The predicted probability is the estimated \(P(Y=1)\), the probability that \(Y\) is 1. So the probability is related to the outcome, but it is not the outcome itself.

3. Could you please describe why we need to create the new dataframe with the age variable?

I think this is referring to slide 23? And I think the newdata2 variable?

I create a new dataset with the age so that I could create a vector of equally spaced ages. The original observations were not necessarily evenly distributed in the range of age values. By making a vector of ages, I can create a smooth line for the predicted probability across ages.

4. When would use `type=link`? Or would be always use response for probability scale?

We use type=link when we want the predicted log-odds (on the logit scale) and type=response when we want the probability scale. Sometimes we need to work in the logit-scale before converting to the probability scale. In those cases, we will want the predicted log-odds!

Muddy Points from Spring 2024

None?? Wowza!