2024-02-26
We can start by visualizing the relationship between life expectancy and female literacy rate by food supply
Questions of interest: Does the effect of female literacy rate on life expectancy differ depending on food supply?
Let’s run an interaction model to see!
Model we are fitting:
In R:
OR
tidy_m_fs = tidy(m_int_fs, conf.int=T)
tidy_m_fs %>% gt() %>% tab_options(table.font.size = 35) %>% fmt_number(decimals = 5)
term | estimate | std.error | statistic | p.value | conf.low | conf.high |
---|---|---|---|---|---|---|
(Intercept) | 70.32060 | 0.72393 | 97.13721 | 0.00000 | 68.87601 | 71.76518 |
FLR_c | 0.15532 | 0.03808 | 4.07905 | 0.00012 | 0.07934 | 0.23130 |
FS_c | 0.00849 | 0.00182 | 4.67908 | 0.00001 | 0.00487 | 0.01212 |
FLR_c:FS_c | −0.00001 | 0.00008 | −0.06908 | 0.94513 | −0.00016 | 0.00015 |
To identify different lines, we need to pick example values of Food Supply:
Food Supply of 1812 kcal PPD
Food Supply of 2812 kcal PPD
Food Supply of 3812 kcal PPD
Interpretation:
In summary, the interaction term can be interpreted as “difference in adjusted female literacy rate effect for every 1 kcal PPD increase in food supply”
It will be helpful to test the interaction to round out this interpretation!!
Null
Alternative
Null / Smaller / Reduced model
Alternative / Larger / Full model
term | df.residual | rss | df | sumsq | statistic | p.value |
---|---|---|---|---|---|---|
LifeExpectancyYrs ~ FLR_c + FS_c | 69.000 | 2,005.556 | NA | NA | NA | NA |
LifeExpectancyYrs ~ FLR_c + FS_c + FLR_c * FS_c | 68.000 | 2,005.415 | 1.000 | 0.141 | 0.005 | 0.945 |
This is more of a model selection question (in coming lectures)
But if we had a model with only TWO covariates, we could step through the following process:
Test the interaction (of potential effect modifier): use a partial F-test to test if interaction term(s) explain enough variation compared to model without interaction
Then look at the main effect (or potential confounder)
If interaction already included, then automatically included as main effect (and thus not checked for confounding)
For variables that are not included in any interactions:
Often we use
The test statistic is
The calculated test statistic is
We are generally calculating:
We (reject/fail to reject) the null hypothesis at the
We test with
Follow the F-test procedure in Lesson 9 (MLR: Inference/F-test)
Use the hypothesis tests for the specific variable combo:
Binary & continuous variable (Lesson 11, LOB 2)
Testing a single coefficient for the interaction term using F-test comparing full model to reduced model
Multi-level & continuous variables (Lesson 11, LOB 3)
Testing group of coefficients for the interaction terms using F-test comparing full to reduced model
Binary & multi-level variable (Lesson 11, LOB 4)
Testing group of coefficients for the interaction terms using F-test comparing full to reduced model
Two continuous variables (Lesson 11, LOB 5)
Testing a single coefficient for the interaction term using F-test comparing full to reduced model
If interaction already included:
For variables that are not included in any interactions:
If the main effect of the primary explanatory variable changes by less than 10%, then the additional variable is neither an effect modifier nor a confounder
Let’s say we have
If we are only considering
Fitted model 1 / reduced model (mod1
):
Fitted model 2 / Full model (mod2
):
Calculation for % change in coefficient
Run models with and without food supply:
Note that the full model when testing for confounding was the reduced model for testing an interaction
Full and reduced are always relative qualifiers of the models that we are testing
term | estimate | std.error | statistic | p.value | conf.low | conf.high |
---|---|---|---|---|---|---|
(Intercept) | 70.29722 | 0.72578 | 96.85709 | 0.00000 | 68.84969 | 71.74475 |
FLR_c | 0.22990 | 0.03219 | 7.14139 | 0.00000 | 0.16570 | 0.29411 |
term | estimate | std.error | statistic | p.value | conf.low | conf.high |
---|---|---|---|---|---|---|
(Intercept) | 70.29722 | 0.63537 | 110.63985 | 0.00000 | 69.02969 | 71.56475 |
FLR_c | 0.15670 | 0.03216 | 4.87271 | 0.00001 | 0.09254 | 0.22085 |
FS_c | 0.00848 | 0.00179 | 4.72646 | 0.00001 | 0.00490 | 0.01206 |
The percent change in female literacy rate’s coefficient estimate was 46.71%.
Thus, food supply is a confounder of female literacy rate in the association between life expectancy and female literacy rate.
Look back at income level and world region: is income level an effect modifier, confounder, or has no effect on the association between life expectancy and world region?
We can start by visualizing the relationship between life expectancy and world region by income level
So we’ll need to revisit the work we did in previous slides on the interaction, then check fo condounding
Step 1: Testing the interaction/effect modifier
Compare model with and without interaction using F-test to see if interaction is significant
Models
Step 2: Testing a confounder (only if not an effect modifier)
Compare model with and without main effect for additional variable (income level) using F-test to see if additional variable (income level) is a confounder
Models
term | df.residual | rss | df | sumsq | statistic | p.value |
---|---|---|---|---|---|---|
LifeExpectancyYrs ~ income_levels2 + four_regions | 67.000 | 1,693.242 | NA | NA | NA | NA |
LifeExpectancyYrs ~ income_levels2 + four_regions + income_levels2 * four_regions | 64.000 | 1,681.304 | 3.000 | 11.938 | 0.151 | 0.928 |
Conclusion: There is not a significant interaction between world region and income level (p = 0.928).
Thus, income level is not an effect modifier of world region. However, we can continue to test if income level is a confounder.
Record the coefficient estimate for centered female literacy rate in both models:
Model 1 (reduced):
term | estimate | std.error | statistic | p.value | conf.low | conf.high |
---|---|---|---|---|---|---|
(Intercept) | 61.27000 | 1.16508 | 52.58870 | 0.00000 | 58.94512 | 63.59488 |
four_regionsAmericas | 14.33000 | 1.90257 | 7.53193 | 0.00000 | 10.53349 | 18.12651 |
four_regionsAsia | 8.11824 | 1.71883 | 4.72313 | 0.00001 | 4.68837 | 11.54810 |
four_regionsEurope | 14.78217 | 1.59304 | 9.27924 | 0.00000 | 11.60332 | 17.96103 |
term | estimate | std.error | statistic | p.value | conf.low | conf.high |
---|---|---|---|---|---|---|
(Intercept) | 60.54716 | 1.16190 | 52.11048 | 0.00000 | 58.22800 | 62.86632 |
four_regionsAmericas | 12.04102 | 2.05816 | 5.85038 | 0.00000 | 7.93292 | 16.14912 |
four_regionsAsia | 7.77808 | 1.66414 | 4.67394 | 0.00001 | 4.45645 | 11.09971 |
four_regionsEurope | 12.51938 | 1.79139 | 6.98864 | 0.00000 | 8.94375 | 16.09501 |
income_levels2Higher income | 3.61419 | 1.46967 | 2.45917 | 0.01651 | 0.68070 | 6.54767 |
Calculate the percent change for
Calculate the percent change for
Calculate the percent change for
Note that two of these % changes are greater than 10%, and one is less than 10%…
There is no set rule when we have more than one estimated coefficient that we examine for confoundeing
In this, I would consider
Thus, I would conclude that income level is a confounder, so we would leave income level’s main effect in the model
Interpretation:
where the “
In summary, the interaction term can be interpreted as “difference in adjusted
Identify outcome (Y) and primary explanatory (X) variables
Decide which other variables might be important and could be potential confounders. Add these to the model.
(Optional step) Test 3 way interactions
Test 2 way interactions
When testing a 2 way interaction, make sure the full and reduced models contain the main effects
First test all the 2 way interactions together using a partial F-test (with
Remaining main effects - to include of not to include?
For variables that are included in any interactions, they will be automatically included as main effects and thus not checked for confounding
For variables that are not included in any interactions:
Check to see if they are confounders by seeing whether exclusion of the variable(s) changes any of the coefficient of the primary explanatory variable (including interactions) X by more than 10%
Interactions 2