TB sections 5.5
2024-12-02
Revisit data visualization for a numeric outcome and categorical variable (from Lesson 8).
Understand the different measures of variability within an Analysis of Variance (ANOVA) table.
Understand the F-statistic and F-distribution that is used to measure the ratio of between group and within group variability.
Determine if groups of means are different from one another using a hypothesis test and F-distribution.
What happens when we want to compare two or more groups’ means?
Understand the different measures of variability within an Analysis of Variance (ANOVA) table.
Understand the F-statistic and F-distribution that is used to measure the ratio of between group and within group variability.
Determine if groups of means are different from one another using a hypothesis test and F-distribution.
Understand the F-statistic and F-distribution that is used to measure the ratio of between group and within group variability.
Determine if groups of means are different from one another using a hypothesis test and F-distribution.
Whether or not two means are significantly different depends on:
Questions:
ANOVA compares the variability between groups to the variability within groups
Analysis of Variance (ANOVA) compares the variability between groups to the variability within groups
\[\sum_{i = 1}^k \sum_{j = 1}^{n_i}(x_{ij} -\bar{x})^2 \ \ = \ \sum_{i = 1}^k n_i(\bar{x}_{i}-\bar{x})^2 \ \ + \ \ \sum_{i = 1}^k\sum_{j = 1}^{n_i}(x_{ij}-\bar{x}_{i})^2\]
Observation | i = 1 | i = 2 | i = 3 | \(\ldots\) | i = k | overall |
---|---|---|---|---|---|---|
j = 1 | \(x_{11}\) | \(x_{21}\) | \(x_{31}\) | \(\ldots\) | \(x_{k1}\) | |
j = 2 | \(x_{12}\) | \(x_{22}\) | \(x_{32}\) | \(\ldots\) | \(x_{k2}\) | |
j = 3 | \(x_{13}\) | \(x_{23}\) | \(x_{33}\) | \(\ldots\) | \(x_{k3}\) | |
j = 4 | \(x_{14}\) | \(x_{24}\) | \(x_{34}\) | \(\ldots\) | \(x_{k4}\) | |
\(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\ddots\) | \(\vdots\) | |
j = \(n_i\) | \(x_{1n_1}\) | \(x_{2n_2}\) | \(x_{3n_3}\) | \(\ldots\) | \(x_{kn_k}\) | |
Means | \(\bar{x}_{1}\) | \(\bar{x}_{2}\) | \(\bar{x}_{3}\) | \(\ldots\) | \(\bar{x}_{k}\) | \(\bar{x}\) |
Variance | \({s}^2_{1}\) | \({s}^2_{2}\) | \({s}^2_{3}\) | \(\ldots\) | \({s}^2_{k}\) | \({s}^2\) |
Total Sums of Squares:
\[SST = \sum_{i = 1}^k \sum_{j = 1}^{n_i}(x_{ij} -\bar{x})^2 = (N-1)s^2\]
where
This is the sum of the squared differences between each observed \(x_{ij}\) value and the grand mean, \(\bar{x}\).
That is, it is the total deviation of the \(x_{ij}\)’s from the grand mean.
Sums of Squares due to Groups:
\[SSG = \sum_{i = 1}^k n_i(\bar{x}_{i}-\bar{x})^2\]
This is the sum of the squared differences between each group mean, \(\bar{x}_{i}\), and the grand mean, \(\bar{x}\).
That is, it is the deviation of the group means from the grand mean.
Also called the Model SS, or \(SS_{model}.\)
Sums of Squares Error:
\[SSE = \sum_{i = 1}^k\sum_{j = 1}^{n_i}(x_{ij}-\bar{x}_{i})^2 = \sum_{i = 1}^k(n_i-1)s_{i}^2\] where \(s_{i}\) is the standard deviation of the \(i^{th}\) group
This is the sum of the squared differences between each observed \(x_{ij}\) value and its group mean \(\bar{x}_{i}\).
That is, it is the deviation of the \(x_{ij}\)’s from the predicted ndrm.ch by group.
Also called the residual sums of squares, or \(SS_{residual}.\)
Revisit data visualization for a numeric outcome and categorical variable (from Lesson 8).
Understand the different measures of variability within an Analysis of Variance (ANOVA) table.
If the groups are actually different, then which of these is more accurate?
If there really is a difference between the groups, we would expect the F-statistic to be which of these:
\[F_{stat} = \dfrac{MSG}{MSE}\]
Revisit data visualization for a numeric outcome and categorical variable (from Lesson 8).
Understand the different measures of variability within an Analysis of Variance (ANOVA) table.
Understand the F-statistic and F-distribution that is used to measure the ratio of between group and within group variability.
Check the assumptions
Set the level of significance \(\alpha\)
Specify the null ( \(H_0\) ) and alternative ( \(H_A\) ) hypotheses
Calculate the test statistic.
Calculate the p-value based on the observed test statistic and its sampling distribution
Write a conclusion to the hypothesis test
The sampling distribution is an F-distribution, if…
genotype_groups <- famuss %>%
group_by(actn3.r577x) %>%
summarise(count = n(),
SD = sd(ndrm.ch))
genotype_groups
# A tibble: 3 × 3
actn3.r577x count SD
<fct> <int> <dbl>
1 CC 173 30.0
2 CT 261 33.2
3 TT 161 35.7
General hypotheses
To test for a difference in means across k groups:
\[\begin{align} H_0 &: \mu_1 = \mu_2 = ... = \mu_k\\ \text{vs. } H_A&: \text{At least one pair } \mu_i \neq \mu_j \text{ for } i \neq j \end{align}\]
Hypotheses test for example
\[\begin{align} H_0 &: \mu_{CC} = \mu_{CT} = \mu_{TT}\\ \text{vs. } H_A&: \text{At least one pair } \mu_i \neq \mu_j \text{ for } i \neq j \end{align}\]
lm
and aov
lm
= linear model; will be using frequently in BSTA 512\[\begin{align} H_0 &: \mu_{CC} = \mu_{CT} = \mu_{TT}\\ \text{vs. } H_A&: \text{At least one pair} \mu_i \neq \mu_j \text{ for } i \neq j \end{align}\]
Conclusion statement:
Recall, visually the three looked pretty close
This is the case that I would also do some work to report the means and standard deviations of each genotype’s percent change in non-dominant arm strength.
Revised conclusion statement:
Lesson 17 Slides