Homework 5 Answers

BSTA 512/612

Author

Your name here!!!

Modified

February 26, 2026

Answers are not necessarily complete! This is just meant to serve as a check if you are stuck.

Questions

The purpose of the below problem is to integrate what we have learned so far into a simple process that might be embedded into our analysis. This will help you see how many of our learning objectives connect as a single work flow. Some of the things we have learned that will be covered:

  • Choosing what to test
  • Interpretations of coefficients (with and without other covariates in the model)
  • F-test procedures and conclusions
  • Testing if a covariate is an effect modifier, confounder, or nothing

Question 1

We are going to revisit the Palmer Penguins dataset from Homework 4. Choosing what to test, interpretations of coefficients, F-test conclusions, and interactions

For this problem we will be using the penguins dataset from the palmerpenguins R package. We will look at the association between flipper length of penguins (measured in mm) and specific species of penguins.

Description from help file:

Includes measurements for penguin species, island in Palmer Archipelago, size (flipper length, body mass, bill dimensions), and sex.

More info about the data.

# first install the palmerpenguins package
# install.packages("palmerpenguins")
library(palmerpenguins)

Attaching package: 'palmerpenguins'
The following objects are masked from 'package:datasets':

    penguins, penguins_raw
data(penguins)

# run the command below to learn more about the variables in the penguins dataset
# ?penguins

Part a

Make a plot of flipper length (outcome) and body mass (explanatory variable). Discuss what you see in the plot.

Part b

Write the simple linear regression model that we will fit for the association between body mass and flipper length. If you use any short hand, please write it out. For example: Let \(BD\) represent bill depth.

\[ FL = \beta_0 + \beta_1 BM + \epsilon \]

Part c

Run the simple linear regression model for the association between body mass and flipper length. Display the regression table output.

Characteristic Beta 95% CI p-value
body_mass_g 0.02 0.01, 0.02 <0.001
Abbreviation: CI = Confidence Interval

Part d

Interpret the coefficient for body mass. Note that as we move forward with a multivariate model, we will refer to this is estimate at the the crude or unadjusted coefficient estimate.

Not given

Part e

Discuss how centering body mass might help with interpretability. Then, center body mass around the mean, run the model again, and display the regression table. Does the intercept and/or slope change from Part c?

term estimate std.error statistic p.value conf.low conf.high
(Intercept) 200.915 0.374 537.446 0.000 200.180 201.651
bm_g_c 0.015 0.000 32.722 0.000 0.014 0.016

Part f

Make a plot of flipper length (outcome) and body mass (explanatory variable) by bill depth. Discuss what you see in the plot. (Hint: bill depth will be the color in the plot.)

Part g

Make a plot of flipper length (outcome) and body mass (explanatory variable) by penguin species. Discuss what you see in the plot and relate it back to the plot in Part f.

Part h

Using only body mass and bill depth as covariates, write out the model that we would fit including the main effects of body mass and bill depth and their interaction. How many coefficients are tested when we test for a significant interaction?

Note

Both covariates should be centered. For the rest of the homework, we will use the centered body mass and bill depth.

1

Part i

Center bill depth.

Not given

Part j

Using only body mass and bill depth as covariates, test if bill depth is an effect modifier.

term df.residual rss df sumsq statistic p.value
flipper_length_mm ~ bm_g_c + bd_c 339.000 13,662.581 NA NA NA NA
flipper_length_mm ~ bm_g_c * bd_c 338.000 13,631.884 1.000 30.697 0.761 0.384

Part k

Using only body mass and species as covariates, write out the model that we would fit including the main effects of body mass and species and their interaction. How many coefficients are tested when we test for a significant interaction?

Hint: Homework 4 can help guide us with the species’ categories.

2

Part l

Using only body mass and species as covariates, test if species is an effect modifier.

term df.residual rss df sumsq statistic p.value
flipper_length_mm ~ bm_g_c + species 338.000 9,839.073 NA NA NA NA
flipper_length_mm ~ bm_g_c * species 336.000 9,611.166 2.000 227.907 3.984 0.020

Part m

Using the results in the above parts, we will move forward with the following model:

\[\begin{aligned} FL = & \beta_0 + \beta_1 BM^c + \beta_2 BD^c + \beta_3 I(\textrm{Chinstrap}) + \beta_4 I(\textrm{Gentoo}) + \\ & \beta_5 BM^c \cdot I(\textrm{Chinstrap}) + \beta_6 BM^c \cdot I(\textrm{Gentoo}) + \epsilon \end{aligned}\]

Run the above model and display the regression table output.

Please note that this is not exactly the best method for selecting a model. I just wanted to step us through a similar thought process.

Characteristic Beta 95% CI p-value
Centered body mass (g) 0.005 0.003, 0.007 <0.001
Centered bill depth (mm) 1.177 0.536, 1.818 <0.001
Species


    Adelie
    Chinstrap 7.933 5.585, 10.28 <0.001
    Gentoo 22.29 18.18, 26.39 <0.001
Centered body mass (g) * Species


    Centered body mass (g) * Chinstrap 0.005 0.001, 0.009 0.011
    Centered body mass (g) * Gentoo 0.003 0.000, 0.005 0.060
Abbreviation: CI = Confidence Interval

Part n

Interpret each coefficient in the model above. There should be 7 total interpretations.

A few examples:

  • \(\widehat\beta_1\): For Adelie penguins, the expected flipper length increases by 0.0049 mm for every 1 g increase in body mass, adjusting for bill depth (95% CI: 0.0028, 0.007).

    • Note: Since bill depth is not in an interaction with body mass, we only need to adjust for bill depth. While this relationship holds for the mean bill depth, it is important to say we are adjusting for bill depth.
  • \(\widehat\beta_3\): For penguins with a body mass of 4201.75 g, the expected flipper length is 7.93 mm greater comparing Chinstrap penguins to Adelie penguins, adjusting for bill depth (95% CI: 5.59, 10.28).

  • \(\widehat\beta_6\): The mean difference in the effect of body mass on flipper length, comparing Gentoo penguins to Adelie penguins, is 0.0025 mm, adjusting for bill depth (95% CI: -10^{-4}, 0.0051).

Part o

For Chinstrap penguins, what is the effect of centered body mass? Use estimable() to find the 95% confidence interval for the effect. Interpret the effect.

0.0098 mm