Homework 4

BSTA 512/612

Due: Sunday February 25, 2024 at 11pm
Author

Your name here!!!

Published

February 25, 2024

Modified

February 21, 2024

Important

This assignment is no longer under construction!! (2/15/2024)

Directions

  • Download the .qmd file here.

  • You will need to download the datasets. Use this link to download the homework datasets needed in this assignment. If you do not want to make changes to the paths set in this document, then make sure the files are stored in a folder named “data” that is housed in the same location as this homework .qmd file.

  • Please upload your homework to Sakai. Upload both your .qmd code file and the rendered .html file

    • Please rename you homework as Lastname_Firstinitial_HW4.qmd. This will help organize the homeworks when the TAs grade them.

    • Please also add the following line under subtitle: "BSTA 512/612": author: First-name Last-name with your first and last name so it is attached to the viewable document.

  • For each question, make sure to include all code and resulting output in the html file to support your answers.

  • Show the work of your calculations using R code within a code chunk. Make sure that both your code and output are visible in the rendered html file. This is the default setting.

  • If you are computing something by hand, you may take a picture of your work and insert the image in this file. You may also use LaTeX to write it inline.

  • Write all answers in complete sentences as if communicating the results to a collaborator. This means including a sentence summarizing results in the context of the research study.

Tip

It is a good idea to try rendering your document from time to time as you go along! Note that rendering automatically saves your qmd file and rendering frequently helps you catch your errors more quickly.

Questions

Question 1

For this problem we will be using the penguins dataset from the palmerpenguins R package. We will look at the association between flipper length of penguins (measured in mm) and specific species of penguins.

Description from help file:

Includes measurements for penguin species, island in Palmer Archipelago, size (flipper length, body mass, bill dimensions), and sex.

More info about the data.

# first install the palmerpenguins package
# install.packages("palmerpenguins")
library(palmerpenguins)
data(penguins)

# run the command below to learn more about the variables in the penguins dataset
# ?penguins

Part a

Calculate the average flipper lengths stratified by each of the penguin species.

Part b

Make a scatterplot (with jigger) of flipper lengths by species, and include diamond-shape points for the averages of the flipper lengths for each of the species.

Part c

Write out the fitted regression equation that models the flipper length by penguin species. Use LaTeX math markup or insert an image of your equation. Do not yet insert values for the regression coefficients, i.e. use the generic coefficients \(\widehat{\beta}_0, \widehat{\beta}_1\). Use Adelie as the reference level.

Part d

Run the linear regression of flipper lengths vs. species in R, and display the regression table output. Which species did R choose as the reference level, and how did you determine this?

Part e

How do we interpret each of the regression coefficients for this model? Write out a separate interpretation for each of the coefficients.

Part f

Calculate the mean flipper length (and the 95% CI) of penguins in the Chinstrap and Gentoo species using the predict() function.

Question 2

A team of nutrition experts investigated the influence of protein content in diet on the relationship between age (explanatory variable) and height (outcome, in centimeters) for children. Using the dataset, CH12Q03.xls, answer the following questions.

This question was adapted from this textbook.

Part a

Using R, make a variable that is a factor for Diet. Make sure to check what values the original variable for Diet can take. How many indicator functions do you need to represent the categorical variable Diet (protein-rich vs. protein-poor)?

Part b

At a level of significance \(\alpha = 0.10\), test whether protein diet modifies the effect of age on height. Justify your answer (e.g., perform a hypothesis test for the interaction between diet and age).

Note: recall that an effect modifier is an interaction.

Part c

Is it possible that diet is a confounder? Note: this will depend on your results from Part b.

Part d

Write the fitted regression equation for our model in Part b. Write the respective regression lines for each specific diet group: protein rich and protein poor. Interpret the slope of each regression line (no need for a 95% CI here).

Question 3

An experiment was conducted regarding a quantitative analysis of factors found in high-density lipoprotein (HDL) in a sample of human blood serum. Three variables thought to be predictive of, or associated with, HDL measurement (Y) were the total cholesterol (X1) and total triglyceride (X2) concentrations in the sample, plus the presence or absence of a certain sticky component of the serum called sinking pre-beta or SPB (X3), coded as 0 if absent and 1 if present. Using the dataset, CH09Q05.xls, answer the following questions.

Part a

Use \(\alpha= 0.05\), test whether if there is a crude association between HDL measurement and total cholesterol. Note: testing for a crude association means we fit a simple linear regression model and see if the association is significant.

Part b

Sometimes simple linear regression leads us to believe that there is no association between two variables, but missing interaction might be obscuring the association. Use \(\alpha= 0.1\) to test whether total triglyceride is an effect modifier of the association between HDL and total cholesterol.

Note: Since the data frame has the variables named as \(Y\), \(X1\), and \(X2\), you may use those in the regression equations, but when you are making a conclusion, please use the specific names of the variables to identify each. For example, \(Y\) is actually HDL.

Note: The plan is to cover interactions between two continuous covariates on Wednesday 2/21. We will not interpret the interaction coefficient for this, but we can test the interaction coefficient. Similar to the interaction for a binary covariate and a continuous covariate, we only need to test one coefficient, \(\beta_3\).

Part c

Is it possible that total triglyceride is a confounder? No need to test this explicity.

Note: this will depend on your results from Part b.