Muddy Points
Lesson
Muddy Points from 2025
1. I had a little bit of trouble visualizing the “trustworthiness” of how reliable the data is on a graph
Exactly. We can get an idea of the “trustworthiness” from a graph, but we can never be certain. That’s why we gotta use the math! The math will exactly quantify that “trustworthiness.”
2. when to use a CI vs a hypothesis test to convey information
Big question. For this class, when each is asked (heh). This is mostly so we get practice doing both and can do either in a quicker, more informal way.
For real life, people will report the p-value of a hypothesis test as they make a more meaningful statement about the coefficient estimate. No one really writes “We rejected the null hypothesis that …” However, they DO write “We found significant association with number of visits to the doctor (p-value = 0.008).” And even more likely, they are writing “We found that for an increase of one doctor visit per year, there was a 0.5 decrease in the expected number of 911 calls (95%CI: 0.2, 0.8). All that to say that people typically use CIs when doing a presentation or written report. When chatting with colleagues, you may informally discuss the hypothesis test.
3. Getting a little lost between all the terms (with hats, without hats etc.)
It’s hard! My main advice is to keep practicing! Hats mean we are referring to the result of our estimate. We can replace any time we write “\(\hat{\beta_1}\)” with the actual value we found in the regression table. For \(\beta_1\), we cannot replace it with an actual value. We can hypothesize what that value is, but we can never replace \(\beta_1\) with a 100% certain value.
4. The equation for confidence bands around the linear model completely lost me
5. trying to keep up with when to talk about the population model and when to talk about the sample estimate
The population model is basically the plan for what we will do. It’s our guide, our blueprint. It does not actually exist in the world though! That’s becuase it is just a way to represent the true, underlying relationship that we cannot fully uncover.
That’s where the sample estimated model comes in. We can take the population model (our plan, guide, blueprint) and then implement it using data. Then we have the estimates! Those estimates are our closest bet to the population from our data.
You can think of it like the blueprint to a house. We have a plan. It’s our ideal structure for the home. If we have really good materials and evreyhting we need (think really good sample), then we can execute the blueprint pretty well. Our house will resemble the blueprint closely. It will never exactly be the blueprint, but it will get close. However, if our materials are lacking, and maybe someone forgot their measuring tape at home, then our house will not resemble our blueprint as nicely. There are many different houses that can come from the same blueprint.
6. The difference between when to use inference and prediction as the words seemed to used interchangeably.
Prediction means we are using the model to predict the outcome. Inference means we are using the model to discuss the relationship between variables.
For the most part, models learned in this class will be most appropriate for inference. Prediction requires much more information, flexibility, and variables to make the predicted value more accurate.
7. Why do we calculate SSE with the square of the residuals? Why not just the sum of the residuals themselves?
Remember that the residuals can be negative or positive depending what side of the best-fit line they fall. If we sum them, no matter how far they are from the line, it is likely the sum will be close to zero. To make sure the negative and positive residuals do not cancel each other out, we square them!
8. In the hypothesis test for the population slope, why was the degrees of freedom calculated as \(df = n-2\), instead of \(df = n-1\)?
This is because our model is what gives us the degrees of freedom. We are performing the hypothesis test on one coefficient, \(\beta_1\), but our model is estimating 2 coefficients! Thus we are restricted to \(n-2\) degrees of freedom.
9. What is the best way to calculate the standard error for B HAT 1, using a regression table?
The best way is to look at the std. error column and then find the row that has the X-variable. The number in the table is the standard error for \(\beta_1\).
Muddy Points from 2024
1. “all of the different manifestations of t”
I love the way this person said it!
So I’ve sorted this out:
We say \(T\) follows a t-distribution
- \(T\) is the general name for the variable (like \(X\) or \(Y\))
We calculate a given \(t\)-value and call that \(t\)
- We also call this the test statistic
The critical value that corresponds to a specific confidence interval and \(\alpha\) is labelled \(t^*\)
2. What’s the difference between SD and variance?
SD (standard deviation) is the square root of the variance. That’s why I sometimes write \(\sigma\) (standard deviation) or \(\sigma^2\) (variance) when I’m talking about the distribution of residuals.
\[ \sigma = \sqrt{\sigma^2} \]
Variance is usually easier to work with mathematically, but standard deviation is in the units that match a variable. For example, the variance of 10 height measurements are in square inches, but the standard deviation are in inches.
3. Why is it important to test if \(\beta_1\) is equal to zero? Is \(\beta_1=0\) the same as the x and y variables having no correlation?
Let’s answer the second question: Yes! It is the same in simple linear regression. When we get to multiple linear regression, and have several variables/coefficients in our model, testing \(\beta_1=0\) won’t be the same as testing the correlation.
In simple linear regression, it is important to test \(\beta_1\) mostly for pedagogical reasons. It’s just helpful to establish the process in a simpler setting.
4. SSE and sigma
We were looking at the relationship between SSE and \(\widehat\sigma^2\):
\[ \widehat{\sigma}^2 = \frac{1}{n-2}SSE \]
The sum of square errors is \(SSE = \sum_{i=1}^n (Y_i - \widehat{Y}_i)^2 = \sum_{i=1}^n \epsilon_i^2\)
The definition of variance is the sum of the squared differences between values and their mean.
So if I had a variable \(S\), with 100 observations, the mean of \(S\), which we call \(\overline{S}\), would be \(\frac{\sum_{i=1}^{100} S_i}{100}\). The variance of \(S\) would be \(\sum_{i=1}^{100} (S_i - \overline{S})^2\).
Now, let’s get back to the sum of square errors: \(SSE = \sum_{i=1}^n \epsilon_i^2\)
The variance of the residuals would be \(\sum_{i=1}^n (\epsilon_i - \overline{\epsilon})^2\). The mean of \(\epsilon\), \(\overline\epsilon\), should be 0 by our assumptions. So the variance of the residuals is \(\sum_{i=1}^n \epsilon_i^2\) which is our SSE!
There is some more complicated math that goes into why our variance is divided by n-2 to get the estimated variance of the residuals, but that’s basically it!