dep_df = read_sas(here("data/completedata.sas7bdat"))Homework 4 Answers
BSTA 512/612
Answers are not necessarily complete! This is just meant to serve as a check if you are stuck.
Questions
Question 1
Part a
| term | estimate | std.error | statistic | p.value | conf.low | conf.high |
|---|---|---|---|---|---|---|
| (Intercept) | 6.4144 | 2.0501 | 3.1288 | 0.0018 | 2.3882 | 10.4406 |
| Fatalism | 0.1527 | 0.0452 | 3.3784 | 0.0008 | 0.0639 | 0.2414 |
| Optimism | −0.3179 | 0.0722 | −4.4058 | 0.0000 | −0.4596 | −0.1762 |
| Spirituality | 0.3587 | 0.1291 | 2.7781 | 0.0056 | 0.1051 | 0.6122 |
Another fun way to display:
tbl_regression(q2_mod_f1, intercept = T)| Characteristic | Beta | 95% CI | p-value |
|---|---|---|---|
| (Intercept) | 6.4 | 2.4, 10 | 0.002 |
| Fatalism | 0.15 | 0.06, 0.24 | <0.001 |
| Optimism | -0.32 | -0.46, -0.18 | <0.001 |
| Spirituality | 0.36 | 0.11, 0.61 | 0.006 |
| Abbreviation: CI = Confidence Interval | |||
Part b
\(\beta_0\): The expected depression score is 6.4 when fatalism, depression, and spirituality scores are 0 (95% CI: 2.4, 10.4).
- Same as homework 2: The intercept does not make sense. A score of 0 is outside the range of possible scores for fatalism, optimism, and spirituality.
\(\beta_1\): For every 1 point higher fatalism score, there is an expected difference of 0.15 points higher depression score, adjusting for optimism and spirituality score (95% CI: 0.06, 0.24).
Part c
Not given
Part d
\[\begin{aligned} \widehat{\text{Depression}} &= 5.39 + 0.15 \cdot \text{Fatalism} \end{aligned}\]
Question 2
Part a
Fit the regression model with all the covariates (Fatalism, Optimism, Spirituality), display the regression table, and write out the fitted regression line.
| Characteristic | Beta | 95% CI | p-value |
|---|---|---|---|
| (Intercept) | 6.4 | 2.4, 10 | 0.002 |
| Fatalism | 0.15 | 0.06, 0.24 | <0.001 |
| Optimism | -0.32 | -0.46, -0.18 | <0.001 |
| Spirituality | 0.36 | 0.11, 0.61 | 0.006 |
| Abbreviation: CI = Confidence Interval | |||
\[\begin{aligned} \widehat{\text{Depression}} &= 6.4 + 0.15 \cdot \text{Fatalism} -0.32 \cdot \text{Optimism} + 0.36 \cdot \text{Spirituality} \end{aligned}\]
Part b
Does at least one of the covariates contribute significantly to the prediction of Depression? (Note: this is an overall test. Please follow the hypothesis test steps. To complete step 4-6, simply output your ANOVA table.)
| term | df.residual | rss | df | sumsq | statistic | p.value |
|---|---|---|---|---|---|---|
| Depression ~ 1 | 611.0000 | 17,167.8366 | NA | NA | NA | NA |
| Depression ~ Fatalism + Optimism + Spirituality | 608.0000 | 15,514.0044 | 3.0000 | 1,653.8322 | 21.6048 | 0.0000 |
Part c
Does the addition of Spirituality add significantly to the prediction of Depression achieved by Fatalism and Optimism?
| term | df.residual | rss | df | sumsq | statistic | p.value |
|---|---|---|---|---|---|---|
| Depression ~ Fatalism + Optimism | 609.00 | 15,710.94 | NA | NA | NA | NA |
| Depression ~ Fatalism + Optimism + Spirituality | 608.00 | 15,514.00 | 1.00 | 196.94 | 7.72 | 0.01 |
Part d
| term | df.residual | rss | df | sumsq | statistic | p.value |
|---|---|---|---|---|---|---|
| Depression ~ Fatalism | 610.00 | 16,141.67 | NA | NA | NA | NA |
| Depression ~ Fatalism + Optimism + Spirituality | 608.00 | 15,514.00 | 2.00 | 627.67 | 12.30 | 0.00 |
Question 3
Part a
Using R, make a variable that is a factor for Diet. Make sure to check what values the original variable for Diet can take. How many indicator functions do you need to represent the categorical variable Diet (protein-rich vs. protein-poor)?
2 levels, 1 indicator
Part b
At a level of significance \(\alpha = 0.10\), test whether protein diet modifies the effect of age on height. Justify your answer (e.g., perform a hypothesis test for the interaction between diet and age).
| term | df.residual | rss | df | sumsq | statistic | p.value |
|---|---|---|---|---|---|---|
| HT ~ AGE + DIET | 24.0000 | 399.8259 | NA | NA | NA | NA |
| HT ~ AGE + DIET + AGE * DIET | 23.0000 | 119.4200 | 1.0000 | 280.4059 | 54.0055 | 0.0000 |
Part c
Is it possible that diet is a confounder? Note: this will depend on your results from Part b.
Part d
Write the fitted regression equation for our model in Part b. Write the respective regression lines for each specific diet group: protein rich and protein poor. Interpret the slope of each regression line (no need for a 95% CI here).
Fitted regression equation:
\[\widehat{\text{Height}}|\text{Age}, \text{Diet} = 51.23+8.69\cdot \text{Age}-0.90\cdot I(\text{Protein-rich})+7.32\cdot \text{Age}\cdot I(\text{Protein-rich})\] Protein rich group \(I(\text{Protein-rich})=1\):
\[\widehat{\text{Height}}|\text{Age}, \text{Diet} = ??\]
Protein poor group \(I(\text{Protein-rich})=0\):
\[\widehat{\text{Height}}|\text{Age}, \text{Diet} = ??\]
Interpretation You need to work through this!
Question 4
Part a
Use \(\alpha= 0.05\), test whether if there is a crude association between HDL measurement and total cholesterol. Note: testing for a crude association means we fit a simple linear regression model and see if the association is significant.
| term | df | sumsq | meansq | statistic | p.value |
|---|---|---|---|---|---|
| X1 | 1.000 | 46.236 | 46.236 | 0.405 | 0.528 |
| Residuals | 40.000 | 4,567.383 | 114.185 | NA | NA |
Part b
Sometimes simple linear regression leads us to believe that there is no association between two variables, but missing interaction might be obscuring the association. Use \(\alpha= 0.1\) to test whether total triglyceride is an effect modifier of the association between HDL and total cholesterol.
| term | df.residual | rss | df | sumsq | statistic | p.value |
|---|---|---|---|---|---|---|
| Y ~ X1 + X2 | 39.000 | 4,478.237 | NA | NA | NA | NA |
| Y ~ X1 + X2 + X1 * X2 | 38.000 | 4,195.314 | 1.000 | 282.923 | 2.563 | 0.118 |
Part c
Is it possible that total triglyceride is a confounder? No need to test this explicity.