# first install the palmerpenguins package
# install.packages("palmerpenguins")
library(palmerpenguins)
data(penguins)
# run the command below to learn more about the variables in the penguins dataset
# ?penguinsHomework 5 Answers
BSTA 512/612
Answers are not necessarily complete! This is just meant to serve as a check if you are stuck.
Questions
The purpose of the below problem is to integrate what we have learned so far into a simple process that might be embedded into our analysis. This will help you see how many of our learning objectives connect as a single work flow. Some of the things we have learned that will be covered:
- Choosing what to test
- Interpretations of coefficients (with and without other covariates in the model)
- F-test procedures and conclusions
- Testing if a covariate is an effect modifier, confounder, or nothing
Question 1
We are going to revisit the Palmer Penguins dataset from Homework 4. Choosing what to test, interpretations of coefficients, F-test conclusions, and interactions
For this problem we will be using the penguins dataset from the palmerpenguins R package. We will look at the association between flipper length of penguins (measured in mm) and specific species of penguins.
Description from help file:
Includes measurements for penguin species, island in Palmer Archipelago, size (flipper length, body mass, bill dimensions), and sex.
Part a
Make a plot of flipper length (outcome) and body mass (explanatory variable). Discuss what you see in the plot.
Part b
Write the simple linear regression model that we will fit for the association between body mass and flipper length. If you use any short hand, please write it out. For example: Let \(BD\) represent bill depth.
\[ FL = \beta_0 + \beta_1 BM + \epsilon \]
Part c
Run the simple linear regression model for the association between body mass and flipper length. Display the regression table output.
| Characteristic | Beta | 95% CI1 | p-value |
|---|---|---|---|
| body_mass_g | 0.02 | 0.01, 0.02 | <0.001 |
| 1 CI = Confidence Interval | |||
Part d
Interpret the coefficient for body mass. Note that as we move forward with a multivariate model, we will refer to this is estimate at the the crude or unadjusted coefficient estimate.
Not given
Part e
Discuss how centering body mass might help with interpretability. Then, center body mass around the mean, run the model again, and display the regression table. Does the intercept and/or slope change from Part c?
| term | estimate | std.error | statistic | p.value | conf.low | conf.high |
|---|---|---|---|---|---|---|
| (Intercept) | 200.915 | 0.374 | 537.446 | 0.000 | 200.180 | 201.651 |
| bm_g_c | 0.015 | 0.000 | 32.722 | 0.000 | 0.014 | 0.016 |
Part f
Make a plot of flipper length (outcome) and body mass (explanatory variable) by bill depth. Discuss what you see in the plot. (Hint: bill depth will be the color in the plot.)
Part g
Make a plot of flipper length (outcome) and body mass (explanatory variable) by penguin species. Discuss what you see in the plot and relate it back to the plot in Part f.
Part h
Using only body mass and bill depth as covariates, write out the model that we would fit including the main effects of body mass and bill depth and their interaction. How many coefficients are tested when we test for a significant interaction?
Both covariates should be centered. For the rest of the homework, we will use the centered body mass and bill depth.
1
Part i
Center bill depth.
Not given
Part j
Using only body mass and bill depth as covariates, test if bill depth is an effect modifier.
| term | df.residual | rss | df | sumsq | statistic | p.value |
|---|---|---|---|---|---|---|
| flipper_length_mm ~ bm_g_c + bd_c | 339.000 | 13,662.581 | NA | NA | NA | NA |
| flipper_length_mm ~ bm_g_c * bd_c | 338.000 | 13,631.884 | 1.000 | 30.697 | 0.761 | 0.384 |
Part k
Using only body mass and species as covariates, write out the model that we would fit including the main effects of body mass and species and their interaction. How many coefficients are tested when we test for a significant interaction?
Hint: Homework 4 can help guide us with the species’ categories.
2
Part l
Using only body mass and species as covariates, test if species is an effect modifier.
| term | df.residual | rss | df | sumsq | statistic | p.value |
|---|---|---|---|---|---|---|
| flipper_length_mm ~ bm_g_c + species | 338.000 | 9,839.073 | NA | NA | NA | NA |
| flipper_length_mm ~ bm_g_c * species | 336.000 | 9,611.166 | 2.000 | 227.907 | 3.984 | 0.020 |
Part m
Using the results in the above parts, we will move forward with the following model:
\[\begin{aligned} FL = & \beta_0 + \beta_1 BM^c + \beta_2 BD^c + \beta_3 I(\textrm{Chinstrap}) + \beta_4 I(\textrm{Gentoo}) + \\ & \beta_5 BM^c \cdot I(\textrm{Chinstrap}) + \beta_6 BM^c \cdot I(\textrm{Gentoo}) + \epsilon \end{aligned}\]
Run the above model and display the regression table output.
Please note that this is not exactly the best method for selecting a model. I just wanted to step us through a similar thought process.
| Characteristic | Beta | 95% CI1 | p-value |
|---|---|---|---|
| Centered body mass (g) | 0.005 | 0.003, 0.007 | <0.001 |
| Centered bill depth (mm) | 1.177 | 0.536, 1.818 | <0.001 |
| Species | |||
| Adelie | — | — | |
| Chinstrap | 7.933 | 5.585, 10.28 | <0.001 |
| Gentoo | 22.29 | 18.18, 26.39 | <0.001 |
| Centered body mass (g) * Species | |||
| Centered body mass (g) * Chinstrap | 0.005 | 0.001, 0.009 | 0.011 |
| Centered body mass (g) * Gentoo | 0.003 | 0.000, 0.005 | 0.060 |
| 1 CI = Confidence Interval | |||
Part n
Interpret each coefficient in the model above. There should be 7 total interpretations.
A few examples:
\(\widehat\beta_1\): For Adelie penguins, the expected flipper length increases by 0.0049 mm for every 1 g increase in body mass, adjusting for bill depth (95% CI: 0.0028, 0.007).
- Note: Since bill depth is not in an interaction with body mass, we only need to adjust for bill depth. While this relationship holds for the mean bill depth, it is important to say we are adjusting for bill depth.
\(\widehat\beta_3\): For penguins with a body mass of 4201.75 g, the expected flipper length is 7.93 mm greater comparing Chinstrap penguins to Adelie penguins, adjusting for bill depth (95% CI: 5.59, 10.28).
\(\widehat\beta_6\): The mean difference in the effect of body mass on flipper length, comparing Gentoo penguins to Adelie penguins, is 0.0025 mm, adjusting for bill depth (95% CI: -10^{-4}, 0.0051).
Part o
For Chinstrap penguins, what is the effect of centered body mass? Use estimable() to find the 95% confidence interval for the effect. Interpret the effect.
0.0098 mm