Homework 4 Answers

BSTA 512/612

Due: Friday February 28, 2025 at 11pm

Author

Your name here!!!

Modified

January 29, 2026

Answers are not necessarily complete! This is just meant to serve as a check if you are stuck.

Questions

Question 1

dep_df = read_sas(here("data/completedata.sas7bdat"))

Part a

term	estimate	std.error	statistic	p.value	conf.low	conf.high
(Intercept)	6.4144	2.0501	3.1288	0.0018	2.3882	10.4406
Fatalism	0.1527	0.0452	3.3784	0.0008	0.0639	0.2414
Optimism	−0.3179	0.0722	−4.4058	0.0000	−0.4596	−0.1762
Spirituality	0.3587	0.1291	2.7781	0.0056	0.1051	0.6122

Another fun way to display:

tbl_regression(q2_mod_f1, intercept = T)

Characteristic	Beta	95% CI	p-value
(Intercept)	6.4	2.4, 10	0.002
Fatalism	0.15	0.06, 0.24	<0.001
Optimism	-0.32	-0.46, -0.18	<0.001
Spirituality	0.36	0.11, 0.61	0.006
Abbreviation: CI = Confidence Interval

Part b

\(\beta_0\): The expected depression score is 6.4 when fatalism, depression, and spirituality scores are 0 (95% CI: 2.4, 10.4).
- Same as homework 2: The intercept does not make sense. A score of 0 is outside the range of possible scores for fatalism, optimism, and spirituality.
\(\beta_1\): For every 1 point higher fatalism score, there is an expected difference of 0.15 points higher depression score, adjusting for optimism and spirituality score (95% CI: 0.06, 0.24).

Part c

Not given

Part d

\[\begin{aligned} \widehat{\text{Depression}} &= 5.39 + 0.15 \cdot \text{Fatalism} \end{aligned}\]

Question 2

Part a

Fit the regression model with all the covariates (Fatalism, Optimism, Spirituality), display the regression table, and write out the fitted regression line.

Characteristic	Beta	95% CI	p-value
(Intercept)	6.4	2.4, 10	0.002
Fatalism	0.15	0.06, 0.24	<0.001
Optimism	-0.32	-0.46, -0.18	<0.001
Spirituality	0.36	0.11, 0.61	0.006
Abbreviation: CI = Confidence Interval

\[\begin{aligned} \widehat{\text{Depression}} &= 6.4 + 0.15 \cdot \text{Fatalism} -0.32 \cdot \text{Optimism} + 0.36 \cdot \text{Spirituality} \end{aligned}\]

Part b

Does at least one of the covariates contribute significantly to the prediction of Depression? (Note: this is an overall test. Please follow the hypothesis test steps. To complete step 4-6, simply output your ANOVA table.)

term	df.residual	rss	df	sumsq	statistic	p.value
Depression ~ 1	611.0000	17,167.8366	NA	NA	NA	NA
Depression ~ Fatalism + Optimism + Spirituality	608.0000	15,514.0044	3.0000	1,653.8322	21.6048	0.0000

Part c

Does the addition of Spirituality add significantly to the prediction of Depression achieved by Fatalism and Optimism?

term	df.residual	rss	df	sumsq	statistic	p.value
Depression ~ Fatalism + Optimism	609.00	15,710.94	NA	NA	NA	NA
Depression ~ Fatalism + Optimism + Spirituality	608.00	15,514.00	1.00	196.94	7.72	0.01

Part d

term	df.residual	rss	df	sumsq	statistic	p.value
Depression ~ Fatalism	610.00	16,141.67	NA	NA	NA	NA
Depression ~ Fatalism + Optimism + Spirituality	608.00	15,514.00	2.00	627.67	12.30	0.00

Question 3

Part a

Using R, make a variable that is a factor for Diet. Make sure to check what values the original variable for Diet can take. How many indicator functions do you need to represent the categorical variable Diet (protein-rich vs. protein-poor)?

2 levels, 1 indicator

Part b

At a level of significance \(\alpha = 0.10\), test whether protein diet modifies the effect of age on height. Justify your answer (e.g., perform a hypothesis test for the interaction between diet and age).

term	df.residual	rss	df	sumsq	statistic	p.value
HT ~ AGE + DIET	24.0000	399.8259	NA	NA	NA	NA
HT ~ AGE + DIET + AGE * DIET	23.0000	119.4200	1.0000	280.4059	54.0055	0.0000

Part c

Is it possible that diet is a confounder? Note: this will depend on your results from Part b.

Part d

Write the fitted regression equation for our model in Part b. Write the respective regression lines for each specific diet group: protein rich and protein poor. Interpret the slope of each regression line (no need for a 95% CI here).

Fitted regression equation:

\[\widehat{\text{Height}}|\text{Age}, \text{Diet} = 51.23+8.69\cdot \text{Age}-0.90\cdot I(\text{Protein-rich})+7.32\cdot \text{Age}\cdot I(\text{Protein-rich})\] Protein rich group \(I(\text{Protein-rich})=1\):

\[\widehat{\text{Height}}|\text{Age}, \text{Diet} = ??\]

Protein poor group \(I(\text{Protein-rich})=0\):

\[\widehat{\text{Height}}|\text{Age}, \text{Diet} = ??\]

Interpretation You need to work through this!

Question 4

Part a

Use \(\alpha= 0.05\), test whether if there is a crude association between HDL measurement and total cholesterol. Note: testing for a crude association means we fit a simple linear regression model and see if the association is significant.

term	df	sumsq	meansq	statistic	p.value
X1	1.000	46.236	46.236	0.405	0.528
Residuals	40.000	4,567.383	114.185	NA	NA

Part b

Sometimes simple linear regression leads us to believe that there is no association between two variables, but missing interaction might be obscuring the association. Use \(\alpha= 0.1\) to test whether total triglyceride is an effect modifier of the association between HDL and total cholesterol.

term	df.residual	rss	df	sumsq	statistic	p.value
Y ~ X1 + X2	39.000	4,478.237	NA	NA	NA	NA
Y ~ X1 + X2 + X1 * X2	38.000	4,195.314	1.000	282.923	2.563	0.118

Part c

Is it possible that total triglyceride is a confounder? No need to test this explicity.