Homework 4 Answers

BSTA 512/612

Due: Friday February 28, 2025 at 11pm
Author

Your name here!!!

Modified

January 29, 2026

Answers are not necessarily complete! This is just meant to serve as a check if you are stuck.

Questions

Question 1

dep_df = read_sas(here("data/completedata.sas7bdat"))

Part a

term estimate std.error statistic p.value conf.low conf.high
(Intercept) 6.4144 2.0501 3.1288 0.0018 2.3882 10.4406
Fatalism 0.1527 0.0452 3.3784 0.0008 0.0639 0.2414
Optimism −0.3179 0.0722 −4.4058 0.0000 −0.4596 −0.1762
Spirituality 0.3587 0.1291 2.7781 0.0056 0.1051 0.6122

Another fun way to display:

tbl_regression(q2_mod_f1, intercept = T)
Characteristic Beta 95% CI p-value
(Intercept) 6.4 2.4, 10 0.002
Fatalism 0.15 0.06, 0.24 <0.001
Optimism -0.32 -0.46, -0.18 <0.001
Spirituality 0.36 0.11, 0.61 0.006
Abbreviation: CI = Confidence Interval

Part b

  • \(\beta_0\): The expected depression score is 6.4 when fatalism, depression, and spirituality scores are 0 (95% CI: 2.4, 10.4).

    • Same as homework 2: The intercept does not make sense. A score of 0 is outside the range of possible scores for fatalism, optimism, and spirituality.
  • \(\beta_1\): For every 1 point higher fatalism score, there is an expected difference of 0.15 points higher depression score, adjusting for optimism and spirituality score (95% CI: 0.06, 0.24).

Part c

Not given

Part d

\[\begin{aligned} \widehat{\text{Depression}} &= 5.39 + 0.15 \cdot \text{Fatalism} \end{aligned}\]

Question 2

Part a

Fit the regression model with all the covariates (Fatalism, Optimism, Spirituality), display the regression table, and write out the fitted regression line.

Characteristic Beta 95% CI p-value
(Intercept) 6.4 2.4, 10 0.002
Fatalism 0.15 0.06, 0.24 <0.001
Optimism -0.32 -0.46, -0.18 <0.001
Spirituality 0.36 0.11, 0.61 0.006
Abbreviation: CI = Confidence Interval

\[\begin{aligned} \widehat{\text{Depression}} &= 6.4 + 0.15 \cdot \text{Fatalism} -0.32 \cdot \text{Optimism} + 0.36 \cdot \text{Spirituality} \end{aligned}\]

Part b

Does at least one of the covariates contribute significantly to the prediction of Depression? (Note: this is an overall test. Please follow the hypothesis test steps. To complete step 4-6, simply output your ANOVA table.)

term df.residual rss df sumsq statistic p.value
Depression ~ 1 611.0000 17,167.8366 NA NA NA NA
Depression ~ Fatalism + Optimism + Spirituality 608.0000 15,514.0044 3.0000 1,653.8322 21.6048 0.0000

Part c

Does the addition of Spirituality add significantly to the prediction of Depression achieved by Fatalism and Optimism?

term df.residual rss df sumsq statistic p.value
Depression ~ Fatalism + Optimism 609.00 15,710.94 NA NA NA NA
Depression ~ Fatalism + Optimism + Spirituality 608.00 15,514.00 1.00 196.94 7.72 0.01

Part d

term df.residual rss df sumsq statistic p.value
Depression ~ Fatalism 610.00 16,141.67 NA NA NA NA
Depression ~ Fatalism + Optimism + Spirituality 608.00 15,514.00 2.00 627.67 12.30 0.00

Question 3

Part a

Using R, make a variable that is a factor for Diet. Make sure to check what values the original variable for Diet can take. How many indicator functions do you need to represent the categorical variable Diet (protein-rich vs. protein-poor)?

2 levels, 1 indicator

Part b

At a level of significance \(\alpha = 0.10\), test whether protein diet modifies the effect of age on height. Justify your answer (e.g., perform a hypothesis test for the interaction between diet and age).

term df.residual rss df sumsq statistic p.value
HT ~ AGE + DIET 24.0000 399.8259 NA NA NA NA
HT ~ AGE + DIET + AGE * DIET 23.0000 119.4200 1.0000 280.4059 54.0055 0.0000

Part c

Is it possible that diet is a confounder? Note: this will depend on your results from Part b.

Part d

Write the fitted regression equation for our model in Part b. Write the respective regression lines for each specific diet group: protein rich and protein poor. Interpret the slope of each regression line (no need for a 95% CI here).

Fitted regression equation:

\[\widehat{\text{Height}}|\text{Age}, \text{Diet} = 51.23+8.69\cdot \text{Age}-0.90\cdot I(\text{Protein-rich})+7.32\cdot \text{Age}\cdot I(\text{Protein-rich})\] Protein rich group \(I(\text{Protein-rich})=1\):

\[\widehat{\text{Height}}|\text{Age}, \text{Diet} = ??\]

Protein poor group \(I(\text{Protein-rich})=0\):

\[\widehat{\text{Height}}|\text{Age}, \text{Diet} = ??\]

 

Interpretation You need to work through this!

Question 4

Part a

Use \(\alpha= 0.05\), test whether if there is a crude association between HDL measurement and total cholesterol. Note: testing for a crude association means we fit a simple linear regression model and see if the association is significant.

term df sumsq meansq statistic p.value
X1 1.000 46.236 46.236 0.405 0.528
Residuals 40.000 4,567.383 114.185 NA NA

Part b

Sometimes simple linear regression leads us to believe that there is no association between two variables, but missing interaction might be obscuring the association. Use \(\alpha= 0.1\) to test whether total triglyceride is an effect modifier of the association between HDL and total cholesterol.

term df.residual rss df sumsq statistic p.value
Y ~ X1 + X2 39.000 4,478.237 NA NA NA NA
Y ~ X1 + X2 + X1 * X2 38.000 4,195.314 1.000 282.923 2.563 0.118

Part c

Is it possible that total triglyceride is a confounder? No need to test this explicity.