Muddy Points

Lesson

Modified

March 4, 2026

Muddy Points 2024

1. What models or values are we comparing in VIF?

Mmm good question! VIFs work for continuous and binary variables. So if your model only has continuous or binary covariates, then the VIFs and GVIFs are the same, and you can use either. The GVIFs are needed for multi-level covariates.

2. Still a little confused on the context of when we use a centered value vs not in our model.

You can always center a value! There are two scenarios where centering is really helpful:

  1. When we have an interaction. Centering makes coefficients more interpretable

  2. When we have a transformation of the variable. Centering avoids issues with multicollinearity.

3. What is the difference between multicollinearity vs confounding vs effect modification?

Here’s a pretty good video about the differences! About 8 minutes long, but easily played at 1.25/1.5 speed.

4. Why we would have both age and age squared in a model

We would only have age and age-squared if we noticed the relationship between age and our outcome was not linear. For example, our plot could look like this:

ggplot(df, aes(x = age, y = y)) + geom_point() + geom_smooth()

And let’s say we see the following plot for age-squared:

ggplot(df, aes(x = age_sq, y = y)) + geom_point() + geom_smooth()

Then we would make the transformation of age for our model. When we include age-squared in the model, we still need to include age. We can run the model with both:

mod = lm(y ~ age + age_sq, data = df)

And we can look at the regression table. Notice that the standard error of age and age-squared’s coefficients are okay, but the intercept’s standard error is really big.

tidy(mod, conf.int = T) %>% gt() %>% fmt_number(decimals = 2)
term estimate std.error statistic p.value conf.low conf.high
(Intercept) −87.51 36.05 −2.43 0.02 −158.35 −16.67
age 3.67 1.60 2.29 0.02 0.53 6.82
age_sq 0.57 0.02 35.32 0.00 0.54 0.60

We can also look at the VIF:

rms::vif(mod)
    age  age_sq 
38.0167 38.0167 
car::vif(mod) # will only give us GVIF if there is a multi-level covariate in the model
    age  age_sq 
38.0167 38.0167 

The VIFs are really big, so centering age will help the multicolinearity of the model.

mod2 = lm(y ~ age_c + age_c_sq, data = df)

Where age_c is age centered at the mean, and age_c_sq is the centered age squared.

tidy(mod2, conf.int = T) %>% gt() %>% fmt_number(decimals = 2)
term estimate std.error statistic p.value conf.low conf.high
(Intercept) 1,459.01 6.96 209.68 0.00 1,445.34 1,472.68
age_c 59.52 0.26 229.32 0.00 59.01 60.03
age_c_sq 0.57 0.02 35.32 0.00 0.54 0.60
car::vif(mod2) # will only give us GVIF if there is a multi-level covariate in the model
   age_c age_c_sq 
1.000054 1.000054 

Yay! The VIFs are much better now! And the intercept and age coefficient estimate have better standard error!