Homework 8 Answers

Modified

November 20, 2025

Book exercises

4.22 Testing for food safety

A food safety inspector is called upon to investigate a restaurant with a few customer reports of poor sanitation practices. The food safety inspector uses a hypothesis testing framework to evaluate whether regulations are not being met. If he decides the restaurant is in gross violation, its license to serve food will be revoked.

(a)

Write the hypotheses in words.

\(H_0:\) The restaurant meets food safety and sanitation regulations.
\(H_a:\) ???

(b)

What is a Type~1 Error in this context?

(c)

What is a Type~2 Error in this context?

The food safety inspector concludes that the restaurant meets food safety and sanitation regulations and the restaurant stays open when the restaurant is actually not safe.

(d)

Which error is more problematic for the restaurant owner? Why?

(e)

Which error is more problematic for the diners? Why?

(f)

As a diner, would you prefer that the food safety inspector requires strong evidence or very strong evidence of health concerns before revoking a restaurant’s license? Explain your reasoning.

5.46 Child care hours

(a)

Not given

(b)

Fail to reject

5.48 True/False: ANOVA, Part II

(a)

False

(b)

Not given

8.2 Young Americans, Part I

Skip part a!

(b)

FALSE.

(c)

FALSE

(d)

TRUE.

8.8 Legalization of marijuana, Part I

ImportantAdditional instructions
  • (b): Calculate the CI both using the formula and using the appropriate R statistical test.
  • Add parts (e) & (f) as instructed below.

(a)

sample statistic

(b)

By hand: \[(0.5862038, 0.6343285)\]

Using R: \[(0.5856442, 0.6343456)\]

(c)

Yes, good approximation

(d)

Yes

(e)

Test whether the proportion of US residents who think marijuana should be made legal is different than 0.586.

By hand: - \(z = 1.957066\) - \(p-value = 0.05033966\)

Using R: - \(p-value = 0.05341977\)

(f)

Are the results from CI and hypothesis test consistent? Why or why not?

Yes!

8.14 2010 Healthcare Law

(a)

FALSE.

(b)

TRUE.

(c)

FALSE.

(d)

FALSE.

8.26 An apple a day keeps the doctor away

No

PSS

Power and Sample size: Auto exhaust and lead exposure revisited

Power

In exercise 5.12 in Homework 6, we tested whether police officers appear to have been exposed to a higher concentration of lead than 35. Calculate the power for the hypothesis test and include an interpretation of the power in the context of the research question. Was it sufficiently powered?

Yes, power is ~1.

Sample size

For the same test, what sample size would be needed for 80% power? Would it be reasonable to conduct the study with these sample sizes? Why or why not? (Hint: think about the assumptions of our distributions when using a t-test)

3

Effect size

Suppose the study has resources to include 30 people. What minimum effect size would they be able to detect with 85% power assuming the same sample mean and standard deviation. Use \(\alpha\) = 0.05.

0.5013972

R

R1: Palmer Penguins ANOVA

  • Use the penguins data from the palmerpenguins package.
    • Don’t forget to first install the palmerpenguins package
  • You can learn more about the Palmer penguins data at https://allisonhorst.github.io/palmerpenguins/
  • We will test whether there are differences in penguins’ mean bill depths when comparing different species.
library(palmerpenguins)

Attaching package: 'palmerpenguins'
The following objects are masked from 'package:datasets':

    penguins, penguins_raw
data(penguins)

Dotplots

Make a dotplot of the penguins’ bill depths stratified by species type. Include points for the mean of each species type as well as a horizontal dashed line for the overall mean. See example from class for the plot I’m describing.

We can use boxplots.

The following plot is not complete. You need to add the boxplots to it. I wanted to give you the code for the species-specific means and the grand sample mean.

ggplot(penguins,
       aes(x = species,
           y=bill_depth_mm,
           fill = species,
           color = species)) +
  geom_hline(aes(yintercept = mean(bill_depth_mm, na.rm = TRUE)),
             lty = "dashed") +
  stat_summary(fun = "mean",
               geom = "point",
               size = 3,
               color = "grey33",
               alpha =1)  +
  theme(legend.position = "none")
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_summary()`).

Assumptions

Investigate whether the assumptions for using an ANOVA been satisfied.

Not given

Which groups significantly different?

Based on the figure, which pairs of species look like they have significantly different mean bill depths?

Hypotheses in symbols or words

Write out in symbols or words the null and alternative hypotheses.

In symbols

\[H_0: \mu_{Adelie} = \mu_{Chinstrap} = \mu_{Gentoo}\] \[H_A: ???\]

Run ANOVA in R

Using R, run the hypothesis test and display the output.

Not given

F statistic

Using the values from the ANOVA table, verify (calculate) the value of the F statistic.

  • Directly with non-reproducible code (value will be slightly off due to rounding):
# the value will be slightly off due to rounding
(F_approx <- 451.9/1.26)
[1] 358.6508

Decision?

Based on the p-value, will we reject or fail to reject the null hypothesis? Why?

\(p\)-value \(1.507658 * 10^{-84}\)

Conclusion

Write a conclusion to the hypothesis test in the context of the problem.

Not given

R2: The Strong Heart Study

The Strong Heart Study is an ongoing study of American Indians residing in 13 tribal communities in three geographic areas (AZ, OK, and SD/ND) to study prevalence and incidence of cardiovascular disease and to identify risk factors. We will be examining the 4-year cumulative incidence of diabetes with one risk factor, glucose tolerance. We are curious if the proportion of individuals diagnosed with diabetes is different between glucose tolerances.

  • Impaired glucose: normal or impaired glucose tolerance at baseline visit (between 1988 and 1991)

  • Diabetes: Indicator of diabetes at follow-up visit (roughly four years after baseline) according to two-hour oral glucose tolerance test

The data are in SHS_data.csv located in the Data folder of the shared OneDrive folder. The following table summarizes the data:

Glucose
Diabetes
Total
Not diabetic Diabetic
Impaired 334 198 532
Normal 1004 128 1132
Total 1338 326 1664

Run a hypothesis test

Complete the hypothesis test to see if the proportion of individuals diagnosed with diabetes is different between glucose tolerances. (Reminder: Follow all steps and put your conclusion in context of the Strong Heart Study)

By hand: - \(z = 12.4420859\) - \(p-value = 1.5441e-35\)

Using R: - Not part of answer, but to check: X-squared = 153.16 - \(p-value = < 2.2e-16\)

Calculate the confidence interval for the difference in proportions

Calculate and interpret the 95% confidence interval for the difference in proportions using the formula. Is it consistent with CI from the R output of the hypothesis test? (Reminder: Make sure to check the assumptions!)

By hand: \[(0.196, 0.324)\]

Using R: \[(0.213, 0.306)\]