Homework 8

Author

Your name here - update this!!!!

Directions

Please turn in this homework on Sakai. You will need to turn in an html file and a qmd file, Please submit your homework in pdf format if you are not rendering a Quarto document. If you are using a Quarto doc to render your full homework, then you can keep it as an html!

You can download the .qmd file for this assignment from Github if you want to work in a Quarto doc. You do not need to work in a Quarto doc for this homework!!

Tip

It is a good idea to try rendering your document from time to time as you go along! Note that rendering automatically saves your Qmd file and rendering frequently helps you catch your errors more quickly.

Book exercises

4.22 Testing for food safety

Note from Nicky: I’m sorry. This problem was not supposed to be in HW 7… If you already did it, you can paste it here. Try to reflect and see if this problem is clearer now that we’ve covered power and sample size.

5.46 Child care hours

5.48 True/False: ANOVA, Part II

Skip parts c and d.

8.2 Young Americans, Part I

Skip part a!

8.8 Legalization of marijuana, Part I

ImportantAdditional instructions
  • (b): Calculate the CI both using the formula and using the appropriate R statistical test.
  • Add parts (e) & (f) as instructed below.

(e)

Test whether the proportion of US residents who think marijuana should be made legal is different than 0.586.

(f)

Are the results from CI and hypothesis test consistent? Why or why not?

8.14 2010 Healthcare Law

8.26 An apple a day keeps the doctor away

1 PSS

1.1 Power and Sample size: Auto exhaust and lead exposure revisited

1.1.1 Power

In exercise 5.12 in Homework 6, we tested whether police officers appear to have been exposed to a higher concentration of lead than 35. Calculate the power for the hypothesis test and include an interpretation of the power in the context of the research question. Was it sufficiently powered?

1.1.2 Sample size

For the same test, what sample size would be needed for 80% power? Would it be reasonable to conduct the study with these sample sizes? Why or why not? (Hint: think about the assumptions of our distributions when using a t-test)

1.1.3 Effect size

Suppose the study has resources to include 30 people. What minimum effect size would they be able to detect with 85% power assuming the same sample mean and standard deviation. Use \(\alpha\) = 0.05.

2 R

2.1 R1: Palmer Penguins ANOVA

  • Use the penguins data from the palmerpenguins package.
    • Don’t forget to first install the palmerpenguins package
  • You can learn more about the Palmer penguins data at https://allisonhorst.github.io/palmerpenguins/
  • We will test whether there are differences in penguins’ mean bill depths when comparing different species.
library(palmerpenguins)

Attaching package: 'palmerpenguins'
The following objects are masked from 'package:datasets':

    penguins, penguins_raw
data(penguins)

2.1.1 Plots

Make a plot of the penguins’ bill depths stratified by species type. Include points for the mean of each species type as well as a horizontal dashed line for the overall mean. See example from class for the plot I’m describing.

2.1.2 Assumptions

Investigate whether the assumptions for using an ANOVA been satisfied.

2.1.3 Which groups significantly different?

Based on the figure, which pairs of species look like they have significantly different mean bill depths?

2.1.4 Hypotheses in symbols or words

Write out in symbols or words the null and alternative hypotheses.

2.1.5 Run ANOVA in R

Using R, run the hypothesis test and display the output.

2.1.6 F statistic

Using the values from the ANOVA table, verify (calculate) the value of the F statistic.

2.1.7 Decision?

Based on the p-value, will we reject or fail to reject the null hypothesis? Why?

2.1.8 Conclusion

Write a conclusion to the hypothesis test in the context of the problem.

2.2 R2: The Strong Heart Study

The Strong Heart Study is an ongoing study of American Indians residing in 13 tribal communities in three geographic areas (AZ, OK, and SD/ND) to study prevalence and incidence of cardiovascular disease and to identify risk factors. We will be examining the 4-year cumulative incidence of diabetes with one risk factor, glucose tolerance. We are curious if the proportion of individuals diagnosed with diabetes is different between glucose tolerances.

  • Impaired glucose: normal or impaired glucose tolerance at baseline visit (between 1988 and 1991)

  • Diabetes: Indicator of diabetes at follow-up visit (roughly four years after baseline) according to two-hour oral glucose tolerance test

The data are in SHS_data.csv located in the Data folder of the shared OneDrive folder. The following table summarizes the data:

Glucose
Diabetes
Total
Not diabetic Diabetic
Impaired 334 198 532
Normal 1004 128 1132
Total 1338 326 1664

2.2.1 Run a hypothesis test

Complete the hypothesis test to see if the proportion of individuals diagnosed with diabetes is different between glucose tolerances. (Reminder: Follow all steps and put your conclusion in context of the Strong Heart Study)

2.2.2 Calculate the confidence interval for the difference in proportions

Calculate and interpret the 95% confidence interval for the difference in proportions using the formula. Is it consistent with CI from the R output of the hypothesis test? (Reminder: Make sure to check the assumptions!)