Answers are not necessarily complete! This is just meant to serve as a check if you are stuck.
Questions
The purpose of the below problem is to integrate what we have learned so far into a simple process that might be embedded into our analysis. This will help you see how many of our learning objectives connect as a single work flow. Some of the things we have learned that will be covered:
Choosing what to test
Interpretations of coefficients (with and without other covariates in the model)
F-test procedures and conclusions
Testing if a covariate is an effect modifier, confounder, or nothing
Question 1
We are going to revisit the Palmer Penguins dataset from Homework 4. Choosing what to test, interpretations of coefficients, F-test conclusions, and interactions
For this problem we will be using the penguins dataset from the palmerpenguins R package. We will look at the association between flipper length of penguins (measured in mm) and specific species of penguins.
Description from help file:
Includes measurements for penguin species, island in Palmer Archipelago, size (flipper length, body mass, bill dimensions), and sex.
# first install the palmerpenguins package# install.packages("palmerpenguins")library(palmerpenguins)
Attaching package: 'palmerpenguins'
The following objects are masked from 'package:datasets':
penguins, penguins_raw
data(penguins)# run the command below to learn more about the variables in the penguins dataset# ?penguins
Part a
Make a plot of flipper length (outcome) and body mass (explanatory variable). Discuss what you see in the plot.
Part b
Write the simple linear regression model that we will fit for the association between body mass and flipper length. If you use any short hand, please write it out. For example: Let \(BD\) represent bill depth.
\[
FL = \beta_0 + \beta_1 BM + \epsilon
\]
Part c
Run the simple linear regression model for the association between body mass and flipper length. Display the regression table output.
Characteristic
Beta
95% CI
p-value
body_mass_g
0.02
0.01, 0.02
<0.001
Abbreviation: CI = Confidence Interval
Part d
Interpret the coefficient for body mass. Note that as we move forward with a multivariate model, we will refer to this is estimate at the the crude or unadjusted coefficient estimate.
Not given
Part e
Discuss how centering body mass might help with interpretability. Then, center body mass around the mean, run the model again, and display the regression table. Does the intercept and/or slope change from Part c?
term
estimate
std.error
statistic
p.value
conf.low
conf.high
(Intercept)
200.915
0.374
537.446
0.000
200.180
201.651
bm_g_c
0.015
0.000
32.722
0.000
0.014
0.016
Part f
Make a plot of flipper length (outcome) and body mass (explanatory variable) by bill depth. Discuss what you see in the plot. (Hint: bill depth will be the color in the plot.)
Part g
Make a plot of flipper length (outcome) and body mass (explanatory variable) by penguin species. Discuss what you see in the plot and relate it back to the plot in Part f.
Part h
Using only body mass and bill depth as covariates, write out the model that we would fit including the main effects of body mass and bill depth and their interaction. How many coefficients are tested when we test for a significant interaction?
Note
Both covariates should be centered. For the rest of the homework, we will use the centered body mass and bill depth.
1
Part i
Center bill depth.
Not given
Part j
Using only body mass and bill depth as covariates, test if bill depth is an effect modifier.
term
df.residual
rss
df
sumsq
statistic
p.value
flipper_length_mm ~ bm_g_c + bd_c
339.000
13,662.581
NA
NA
NA
NA
flipper_length_mm ~ bm_g_c * bd_c
338.000
13,631.884
1.000
30.697
0.761
0.384
Part k
Using only body mass and species as covariates, write out the model that we would fit including the main effects of body mass and species and their interaction. How many coefficients are tested when we test for a significant interaction?
Hint: Homework 4 can help guide us with the species’ categories.
2
Part l
Using only body mass and species as covariates, test if species is an effect modifier.
term
df.residual
rss
df
sumsq
statistic
p.value
flipper_length_mm ~ bm_g_c + species
338.000
9,839.073
NA
NA
NA
NA
flipper_length_mm ~ bm_g_c * species
336.000
9,611.166
2.000
227.907
3.984
0.020
Part m
Using the results in the above parts, we will move forward with the following model:
Run the above model and display the regression table output.
Please note that this is not exactly the best method for selecting a model. I just wanted to step us through a similar thought process.
Characteristic
Beta
95% CI
p-value
Centered body mass (g)
0.005
0.003, 0.007
<0.001
Centered bill depth (mm)
1.177
0.536, 1.818
<0.001
Species
Adelie
—
—
Chinstrap
7.933
5.585, 10.28
<0.001
Gentoo
22.29
18.18, 26.39
<0.001
Centered body mass (g) * Species
Centered body mass (g) * Chinstrap
0.005
0.001, 0.009
0.011
Centered body mass (g) * Gentoo
0.003
0.000, 0.005
0.060
Abbreviation: CI = Confidence Interval
Part n
Interpret each coefficient in the model above. There should be 7 total interpretations.
A few examples:
\(\widehat\beta_1\): For Adelie penguins, the expected flipper length increases by 0.0049 mm for every 1 g increase in body mass, adjusting for bill depth (95% CI: 0.0028, 0.007).
Note: Since bill depth is not in an interaction with body mass, we only need to adjust for bill depth. While this relationship holds for the mean bill depth, it is important to say we are adjusting for bill depth.
\(\widehat\beta_3\): For penguins with a body mass of 4201.75 g, the expected flipper length is 7.93 mm greater comparing Chinstrap penguins to Adelie penguins, adjusting for bill depth (95% CI: 5.59, 10.28).
\(\widehat\beta_6\): The mean difference in the effect of body mass on flipper length, comparing Gentoo penguins to Adelie penguins, is 0.0025 mm, adjusting for bill depth (95% CI: -10^{-4}, 0.0051).
Part o
For Chinstrap penguins, what is the effect of centered body mass? Use estimable() to find the 95% confidence interval for the effect. Interpret the effect.