Homework 4

BSTA 512/612

Author

Your name here!!!

Modified

February 26, 2026

Directions

  • Download the .qmd file here.

  • You will need to download the datasets. Use this link to download the HW4 datasets needed in this assignment. If you do not want to make changes to the paths set in this document, then make sure the files are stored in a folder named “data” that is housed in the same location as your HW4 .qmd file.

  • Please upload your homework to Sakai. Upload both your .qmd code file and the rendered .html file

  • For each question, make sure to include all code and resulting output in the html file to support your answers

  • Show the work of your calculations using R code within a code chunk. Make sure that both your code and output are visible in the rendered html file. This is the default setting.

  • Write all answers in complete sentences as if communicating the results to a collaborator.

Tip

It is a good idea to try rendering your document from time to time as you go along! Note that rendering automatically saves your .qmd file and rendering frequently helps you catch your errors more quickly.

Questions

Question 1

This question uses the same dataset as HW 2, question 1.

This question is based on data collected as part of an observational study of patients who suffered from stroke.

Dataset: The main goal was to study various psychological factors: optimism, fatalism, depression, spirituality, and their relationship with stroke severity and other health outcomes among the study participants. Data were collected using questionnaires during a baseline interview and also medical chart review. More information about this study can be found in the article Fatalism, optimism, spirituality, depressive symptoms and stroke outcome: a population based analysis.

The dataset that you will work with is called completedata.sas7bdat. The two variables we are interested in are:

  • Covariate 1: Fatalism (larger values indicate that the individual feels less control of their life)

    • Potential scores range from 8 to 40
  • Covariate 2: Optimism (larger values indicate that the individual feels higher levels of optimism)

    • Potential scores range from 6 to 24
  • Covariate 3: Spirituality (larger values indicate that the individual has more belief in a higher power)

    • Potential scores range from 2 to 8
  • Outcome: Depression (larger values imply increased depression)

    • Potential scores range from 0 to 27

For our homework purposes we will assume each variable is continuous.

dep_df = read_sas(here("./data/completedata.sas7bdat"))

Part a

Fit the regression model with all the covariates (Fatalism, Optimism, Spirituality), display the regression table, and write out the fitted regression line.

Part b

Interpret each coefficient (\(\beta_0\), \(\beta_1\), \(\beta_2\), \(\beta_3\)).

Does the intercept make sense for the range of values that each covariate can take? Explain.

Part c

Recall in Homework 2, we ran a simple linear regression model for Depression vs. Fatalism with the following interpretation for the coefficient: For every 1 point higher fatalism score, there is an expected difference of 0.25 points higher depression score (95%CI: 0.17, 0.32).

Does the addition of Optimism and Spirituality change our coefficient estimate for Fatalism? (No need for an official hypothesis test here. I just want us to note some differences.)

Part d

From the fitted regression model, calculate the regression line when Optimism score is 10 and Spirituality score is 6.

Question 2

This question uses the same dataset as HW 2, question 1 and HW 3, question 4.

This question is based on data collected as part of an observational study of patients who suffered from stroke.

Dataset: The main goal was to study various psychological factors: optimism, fatalism, depression, spirituality, and their relationship with stroke severity and other health outcomes among the study participants. Data were collected using questionnaires during a baseline interview and also medical chart review. More information about this study can be found in the article Fatalism, optimism, spirituality, depressive symptoms and stroke outcome: a population based analysis.

The dataset that you will work with is called completedata.sas7bdat. The two variables we are interested in are:

  • Covariate 1: Fatalism (larger values indicate that the individual feels less control of their life)

    • Potential scores range from 8 to 40
  • Covariate 2: Optimism (larger values indicate that the individual feels higher levels of optimism)

    • Potential scores range from 6 to 24
  • Covariate 3: Spirituality (larger values indicate that the individual has more belief in a higher power)

    • Potential scores range from 2 to 8
  • Outcome: Depression (larger values imply increased depression)

    • Potential scores range from 0 to 27

For our homework purposes we will assume each variable is continuous.

dep_df = read_sas(here("./data/completedata.sas7bdat"))

Part a

Fit the regression model with all the covariates (Fatalism, Optimism, Spirituality), display the regression table, and write out the fitted regression line.

Part b

Does at least one of the covariates contribute significantly to the prediction of Depression? (Note: this is an overall test. Please follow the hypothesis test steps. To complete step 5-6, simply output your ANOVA table.)

Note

We have not covered how to check the model assumptions in a multiple linear regression, so you can skip that step.

Part c

Does the addition of Spirituality add significantly to the prediction of Depression achieved by Fatalism and Optimism?

Note

We have not covered how to check the model assumptions in a multiple linear regression, so you can skip that step.

Part d

Does the addition of Spirituality and Optimism add significantly to the prediction of Depression achieved by Fatalism?

Note

We have not covered how to check the model assumptions in a multiple linear regression, so you can skip that step.

Question 3

A team of nutrition experts investigated the influence of protein content in diet on the relationship between age (explanatory variable) and height (outcome, in centimeters) for children. Using the dataset, CH12Q03.xls, answer the following questions.

This question was adapted from this textbook.

Part a

Using R, make a variable that is a factor for Diet. Make sure to check what values the original variable for Diet can take. How many indicator functions do you need to represent the categorical variable Diet (protein-rich vs. protein-poor)?

Part b

At a level of significance \(\alpha = 0.10\), test whether protein diet modifies the effect of age on height. Justify your answer (e.g., perform a hypothesis test for the interaction between diet and age).

Note: recall that we model an effect modifier with an interaction.

Part c

Is it possible that diet is a confounder? Note: this will depend on your results from Part b.

Part d

Write the fitted regression equation for our model in Part b. Write the respective regression lines for each specific diet group: protein rich and protein poor. Interpret the slope of each regression line (include the 95% CI here).

Question 4

An experiment was conducted regarding a quantitative analysis of factors found in high-density lipoprotein (HDL) in a sample of human blood serum. Three variables thought to be predictive of, or associated with, HDL measurement (Y) were the total cholesterol (X1) and total triglyceride (X2) concentrations in the sample, plus the presence or absence of a certain sticky component of the serum called sinking pre-beta or SPB (X3), coded as 0 if absent and 1 if present. Using the dataset, CH09Q05.xls, answer the following questions.

Part a

Use \(\alpha= 0.05\), test whether if there is a crude association between HDL measurement and total cholesterol. Note: testing for a crude association means we fit a simple linear regression model and see if the association is significant.

Part b

Sometimes simple linear regression leads us to believe that there is no association between two variables, but missing interaction might be obscuring the association. Use \(\alpha= 0.1\) to test whether total triglyceride is an effect modifier of the association between HDL and total cholesterol. Make sure to include a concluding statement.

Note: Since the data frame has the variables named as \(Y\), \(X1\), and \(X2\), you may use those in the regression equations, but when you are making a conclusion, please use the specific names of the variables to identify each. For example, \(Y\) is actually HDL.

Part c

Test whether total triglyceride is a confounder by comparing the model in Part a to a model that includes total triglyceride. Make sure to include a concluding statement and interpret your coefficients.

Note

We have not covered how to check the model assumptions in a multiple linear regression, so you can skip that step.