variable | n | min | max | median | iqr | mean | sd | se | ci |
---|---|---|---|---|---|---|---|---|---|
velocity.diff | 12 | 0.05 | 0.11 | 0.08 | 0.042 | 0.078 | 0.022 | 0.006 | 0.014 |
Homework 7 Answers
To see my math equations properly, you need to download the html file, then open it! One Drive does not show the math correctly!!
Book exercises
5.16 Paired or not, Part II
In each of the following scenarios, determine if the data are paired.
(a)
We would like to know if Intel’s stock and Southwest Airlines’ stock have similar rates of return. To find out, we take a random sample of 50 days, and record Intel’s and Southwest’s stock on those same days.
Paired
(b)
We randomly sample 50 items from Target stores and note the price for each. Then we visit Walmart and collect the price for each of those same 50 items.
Paired
(c)
A school board would like to determine whether there is a difference in average SAT scores for students at one high school versus another high school in the district. To check, they take a simple random sample of 100 students from each high school.
Not paired
5.22 DDT exposure
Suppose that you are interested in determining whether exposure to the organochloride DDT, which has been used extensively as an insecticide for many years, is associated with breast cancer in women. As part of a study that investigated this issue, blood was drawn from a sample of women diagnosed with breast cancer over a six-year period and a sample of healthy control subjects matched to the cancer patients on age, menopausal status, and date of blood donation. Each woman’s blood level of DDE (an important byproduct of DDT in the human body) was measured, and the difference in levels for each patient and her matched control calculated. A sample of 171 such differences has mean \(\bar{d} = 2.7\) ng/mL and standard deviation \(s_{d} = 15.9\) ng/mL. Differences were calculated as \(DDE_{cancer} - DDE_{control}\).
(a)
Test the null hypothesis that the mean blood levels of DDE are identical for women with breast cancer and for healthy control subjects. What do you conclude?
\[t_{\bar{d}} = 2.22057\]
\[p-value = 0.02770171\]
Reject the null
(b)
Would you expect a 95% confidence interval for the true difference in population mean DDE levels to contain the value 0?
Does not contain 0
5.34 Placebos without deception
While placebo treatment can influence subjective symptoms, it is typically believed that patient response to placebo requires concealment or deception; in other words, a patient must believe that they are receiving an effective treatment in order to experience the benefits of being treated with an inert substance. Researchers recruited patients suffering from irritable bowel syndrome (IBS) to test whether placebo responses are neutralized by awareness that the treatment is a placebo.
Patients were randomly assigned to either the treatment arm or control arm. Those in the treatment arm were given placebo pills, which were described as “something like sugar pills, which have been shown in rigorous clinical testing to produce significant mind-body self-healing processes”. Those in the control arm did not receive treatment. At the end of the study, all participants answered a questionnaire called the IBS Global Improvement Scale (IBS-GIS) which measures whether IBS symptoms have improved; higher scores are indicative of more improvement.
At the end of the study, the 37 participants in the open placebo group had IBS-GIS scores with \(\bar{x} = 5.0\) and \(s = 1.5\), while the 43 participants in the no treatment group had IBS-GIS scores with \(\bar{x} = 3.9\) and \(s = 1.3\).
Based on an analysis of the data, summarize whether the study demonstrates evidence that placebos administered without deception may be an effective treatment for IBS.
\[t_{\bar{d}} = 3.47654\]
\[p-value = 0.00134\]
Reject the null
4.22 Testing for food safety
A food safety inspector is called upon to investigate a restaurant with a few customer reports of poor sanitation practices. The food safety inspector uses a hypothesis testing framework to evaluate whether regulations are not being met. If he decides the restaurant is in gross violation, its license to serve food will be revoked.
(a)
Write the hypotheses in words.
\(H_0:\) The restaurant meets food safety and sanitation regulations.
\(H_a:\) ???
(b)
What is a Type~1 Error in this context?
(c)
What is a Type~2 Error in this context?
The food safety inspector concludes that the restaurant meets food safety and sanitation regulations and the restaurant stays open when the restaurant is actually not safe.
(d)
Which error is more problematic for the restaurant owner? Why?
(e)
Which error is more problematic for the diners? Why?
(f)
As a diner, would you prefer that the food safety inspector requires strong evidence or very strong evidence of health concerns before revoking a restaurant’s license? Explain your reasoning.
Non-book exercises
R1: Swim times
- In these exercises you will use R to work through the swim times example from Section 5.2 in the textbook.
- The data are in the
oibiostats
package and calledswim
.
Mean & SD of differences
Calculate the mean and standard deviation for the differences in swim times, and compare them to the ones in the book. Which order were the differences calculated, wet suit - swim suit or the opposite? Were all the differences positive?
Histogram of differences
Create a histogram of the differences in swim times and comment on the distribution shape.
Hypothesis test
Run the appropriate statistical test in R to verify the test statistic, p-value, and CI in the textbook.
estimate | statistic | p.value | parameter | conf.low | conf.high | method | alternative |
---|---|---|---|---|---|---|---|
0.0775 | 12.31815 | 8.885414e-08 | 11 | 0.06365244 | 0.09134756 | Paired t-test | two.sided |
R2: 2-sample independent t-test
This problem was adapted from Dr. Maria Tackett’s Intro to Data Science homework.
The dataset is adapted from Little et al. (2007), and contains voice measurements from individuals both with and without Parkinson’s Disease (PD), a progressive neurological disorder that affects the motor system. The aim of Little et al.’s study was to examine whether Parkinson’s Disease could be diagnosed by examining the spectral (sound-wave) properties of patients’ voices.
147 measurements were taken from patients with PD, and 48 measurements were taken from healthy patients who served as controls. For the purposes of this assignment, you may assume that measurements are representative of the underlying populations (PD vs. healthy).
The variables in the dataset are as follows:
clip
: ID of the recording numberjitter
: a measure of variation in fundamental frequencyshimmer
: a measure of variation in amplitudehnr
: a ratio of total components vs. noise in the voice recordingstatus
: PD vs. Healthyavg.f.q
: 1, 2, or 3, corresponding to average vocal fundamental frequency- 1 = low,
- 2 = mid
- 3 = high
The data are in parkinsons.csv
located in the Data
folder of the shared OneDrive folder.
We will be focusing on the variable HNR. We will see if there is evidence that the mean HNR is different for people with PD and people without PD.
= read.csv(here("data", "parkinsons.csv")) park
Histogram of HNR
Use histograms to visualize the distribution for HNR, comparing people with and without PD.
Mean & SD of HNR
Calculate the mean and standard deviation for HNR in the voice recordings of adults with and without Parkinson’s disease.
status | variable | n | min | max | median | iqr | mean | sd | se | ci |
---|---|---|---|---|---|---|---|---|---|---|
Healthy | hnr | 48 | 17.883 | 33.047 | 24.997 | 3.146 | 24.679 | 3.435 | 0.496 | 0.997 |
PD | hnr | 147 | 8.441 | 29.928 | 21.414 | 5.383 | 20.974 | 4.339 | 0.358 | 0.707 |
Hypothesis test
Run the appropriate statistical test in R. Please include all steps in the hypothesis test!
\[t_{\overline{x}_{PD} - \overline{x}_{NPD}} = 6.0592\]
By hand, the p-value was \(1.1\times 10^{-7}\). Using R, the p-value was \(2\times 10^{-8}\).
Reject the null