Homework 9

Author

Your name here - update this!!!!

Modified

July 11, 2025

Important

Homework is ready to be worked on!! (12/5/24)

Due 12/13/24

Directions

Please turn in this homework on Sakai. This homework must be submitted using a Quarto document. Please keep it rendered as an html and turn in the html document! I know past homeworks said pdf, but all Quarto docs should be rendered as html for this class!

You can download the .qmd file for this assignment from Github

Tip

It is a good idea to try rendering your document from time to time as you go along! Note that rendering automatically saves your Qmd file and rendering frequently helps you catch your errors more quickly.

Book exercises

8.28 True or false, Part II

8.34 Coffee and Depression

0.0.1 (a)

0.0.2 (b)-(f)

Instead of doing part (b) - (f), please run a hypothesis test using the Chi-squared test.

0.0.3 (g)

5.46 Child care hours

5.48 True/False: ANOVA, Part II

1 R exercise

1.1 Load all the packages you need below here.

1.2 R1: Palmer Penguins ANOVA

  • Use the penguins data from the palmerpenguins package.
    • Don’t forget to first install the palmerpenguins package
  • You can learn more about the Palmer penguins data at https://allisonhorst.github.io/palmerpenguins/
  • We will test whether there are differences in penguins’ mean bill depths when comparing different species.
library(palmerpenguins)
data(penguins)

1.2.1 Dotplots

Make a dotplot of the penguins’ bill depths stratified by species type. Include points for the mean of each species type as well as a horizontal dashed line for the overall mean. See example from class for the plot I’m describing.

1.2.2 Technical conditions

Investigate whether the technical conditions for using an ANOVA been satisfied.

1.2.3 Which groups significantly different?

Based on the figure, which pairs of species look like they have significantly different mean bill depths?

1.2.4 Hypotheses in symbols or words

Write out in symbols or words the null and alternative hypotheses.

1.2.5 Run ANOVA in R

Using R, run the hypothesis test and display the output.

1.2.6 F statistic

Using the values from the ANOVA table, verify (calculate) the value of the F statistic.

1.2.7 Decision?

Based on the p-value, will we reject or fail to reject the null hypothesis? Why?

1.2.8 Conclusion

Write a conclusion to the hypothesis test in the context of the problem.

2 Nonparametric-Tests

2.1 NPT 1: (Wilcoxon) Signed-rank test

Vegetarian diet and cholesterol levels

When covering paired t-tests on Day 10 Part 2, the class notes used the example of testing whether a vegetarian diet changed cholesterol levels. The data are in the file chol213.csv at https://niederhausen.github.io/BSTA_511_F23/data/chol213.csv. In this exercise we will use non-parametric tests to test for a change and compare the results to the paired t-test.

2.1.1 Hypotheses

What are the hypotheses for the signed-rank test (2-sided) in the context of the problem?

2.1.2 Test in R

Run the (Wilcoxon) Signed-rank test in R. What is the p-value and how does it compare to the p-value of the sign test and the paired t-test (check the class notes for this)?

8.38 (a) & (extra) Salt intake and CVD

Do not do parts (b)-(c) in the book

(a)

  • You can use the expected cell counts from expected() in R (you do not need to compute them using the formula).
  • Comment on whether the sample size condition is met or not for these data.

(extra)

Run a Fisher’s Exact test. Include the hypotheses and a conclusion in the context of the problem.

3 Extra R exercises (optional)

3.1 R2: Palmer Penguins SLR

Important

Below I frequently use the terminology variable1 vs. variable2. When we write this, the first variable is \(y\) (vertical axis) and the second is \(x\) (horizontal axis). Thus it’s always \(y\) vs. \(x\) (NOT \(x\) vs. \(y\)).

3.1.1 Scatterplots

  • For each of the following pairs of variables, make a scatterplot showing the best fit line and describe the relationship between the variables.
  • In particular address
    • whether the association is linear,
    • how strong it is (based purely on the plot), and
    • what direction (positive, negative, or neither).
  1. body mass vs. flipper length

  2. bill depth vs. flipper length

  3. bill depth vs. bill length

3.1.2 Correlations

  • For each of the following pairs of variables, find the correlation coefficient \(r\).
  1. body mass vs. flipper length

  2. bill depth vs. flipper length

  3. bill depth vs. bill length

3.1.3 Compare associations

Which pair of variables has the strongest association? Which has the weakest? Explain how you determined this.

3.1.4 Body mass vs. flipper length SLR

Run the simple linear regression model for body mass vs. flipper length, and display the regression table output.

3.1.5 Regression equation

Write out the regression equation for this model, using the variable names instead of the generic \(x\) and \(y\), and inserting the regression coefficient values.

3.1.6 Interpret intercept

Write a sentence interpreting the intercept for this example. Is it meaningful in this example?

3.1.7 Interpret slope

Write a sentence interpreting the slope for this example.