library(palmerpenguins)
Attaching package: 'palmerpenguins'
The following objects are masked from 'package:datasets':
penguins, penguins_raw
data(penguins)November 20, 2025
A food safety inspector is called upon to investigate a restaurant with a few customer reports of poor sanitation practices. The food safety inspector uses a hypothesis testing framework to evaluate whether regulations are not being met. If he decides the restaurant is in gross violation, its license to serve food will be revoked.
Write the hypotheses in words.
\(H_0:\) The restaurant meets food safety and sanitation regulations.
\(H_a:\) ???
What is a Type~1 Error in this context?
What is a Type~2 Error in this context?
The food safety inspector concludes that the restaurant meets food safety and sanitation regulations and the restaurant stays open when the restaurant is actually not safe.
Which error is more problematic for the restaurant owner? Why?
Which error is more problematic for the diners? Why?
As a diner, would you prefer that the food safety inspector requires strong evidence or very strong evidence of health concerns before revoking a restaurant’s license? Explain your reasoning.
Not given
Fail to reject
False
Not given
Skip part a!
FALSE.
FALSE
TRUE.
sample statistic
By hand: \[(0.5862038, 0.6343285)\]
Using R: \[(0.5856442, 0.6343456)\]
Yes, good approximation
Yes
Test whether the proportion of US residents who think marijuana should be made legal is different than 0.586.
By hand: - \(z = 1.957066\) - \(p-value = 0.05033966\)
Using R: - \(p-value = 0.05341977\)
Are the results from CI and hypothesis test consistent? Why or why not?
Yes!
FALSE.
TRUE.
FALSE.
FALSE.
No
In exercise 5.12 in Homework 6, we tested whether police officers appear to have been exposed to a higher concentration of lead than 35. Calculate the power for the hypothesis test and include an interpretation of the power in the context of the research question. Was it sufficiently powered?
Yes, power is ~1.
For the same test, what sample size would be needed for 80% power? Would it be reasonable to conduct the study with these sample sizes? Why or why not? (Hint: think about the assumptions of our distributions when using a t-test)
3
Suppose the study has resources to include 30 people. What minimum effect size would they be able to detect with 85% power assuming the same sample mean and standard deviation. Use \(\alpha\) = 0.05.
0.5013972
penguins data from the palmerpenguins package.
palmerpenguins package
Attaching package: 'palmerpenguins'
The following objects are masked from 'package:datasets':
penguins, penguins_raw
Make a dotplot of the penguins’ bill depths stratified by species type. Include points for the mean of each species type as well as a horizontal dashed line for the overall mean. See example from class for the plot I’m describing.
We can use boxplots.
The following plot is not complete. You need to add the boxplots to it. I wanted to give you the code for the species-specific means and the grand sample mean.
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_summary()`).
Investigate whether the assumptions for using an ANOVA been satisfied.
Not given
Based on the figure, which pairs of species look like they have significantly different mean bill depths?
Write out in symbols or words the null and alternative hypotheses.
\[H_0: \mu_{Adelie} = \mu_{Chinstrap} = \mu_{Gentoo}\] \[H_A: ???\]
Using R, run the hypothesis test and display the output.
Not given
Using the values from the ANOVA table, verify (calculate) the value of the F statistic.
Based on the p-value, will we reject or fail to reject the null hypothesis? Why?
\(p\)-value \(1.507658 * 10^{-84}\)
Write a conclusion to the hypothesis test in the context of the problem.
Not given
The Strong Heart Study is an ongoing study of American Indians residing in 13 tribal communities in three geographic areas (AZ, OK, and SD/ND) to study prevalence and incidence of cardiovascular disease and to identify risk factors. We will be examining the 4-year cumulative incidence of diabetes with one risk factor, glucose tolerance. We are curious if the proportion of individuals diagnosed with diabetes is different between glucose tolerances.
Impaired glucose: normal or impaired glucose tolerance at baseline visit (between 1988 and 1991)
Diabetes: Indicator of diabetes at follow-up visit (roughly four years after baseline) according to two-hour oral glucose tolerance test
The data are in SHS_data.csv located in the Data folder of the shared OneDrive folder. The following table summarizes the data:
| Glucose |
Diabetes
|
Total | |
|---|---|---|---|
| Not diabetic | Diabetic | ||
| Impaired | 334 | 198 | 532 |
| Normal | 1004 | 128 | 1132 |
| Total | 1338 | 326 | 1664 |
Complete the hypothesis test to see if the proportion of individuals diagnosed with diabetes is different between glucose tolerances. (Reminder: Follow all steps and put your conclusion in context of the Strong Heart Study)
By hand: - \(z = 12.4420859\) - \(p-value = 1.5441e-35\)
Using R: - Not part of answer, but to check: X-squared = 153.16 - \(p-value = < 2.2e-16\)
Calculate and interpret the 95% confidence interval for the difference in proportions using the formula. Is it consistent with CI from the R output of the hypothesis test? (Reminder: Make sure to check the assumptions!)
By hand: \[(0.196, 0.324)\]
Using R: \[(0.213, 0.306)\]