Muddy Points

Lesson 11: Hypothesis testing part 01

Modified

November 5, 2025

Fall 2025

1. Wording for reject vs fail to reject null hypothesis

I asked Gemini to help me explain this one, and I really liked its analogy. From Gemini:

The “Innocent Until Proven Guilty” Analogy

Think of hypothesis testing like a criminal trial:

The Null Hypothesis (\(H_0\)): This is the default assumption, like the defendant being “innocent.”
The Alternative Hypothesis (\(H_A\)): This is the new claim being tested, like the prosecutor arguing the defendant is “guilty.”
The Data: This is the evidence presented in court.

The jury’s job is not to prove the defendant is innocent. Their job is to see if there is enough evidence beyond a reasonable doubt to reject the idea of their innocence.

At the end of the trial, the jury has two possible verdicts:

“Guilty”: There was enough strong evidence to reject the assumption of innocence. In statistics, we say we “reject the null hypothesis.”
“Not Guilty”: This does not mean the jury proved the defendant is innocent. It just means there was not enough evidence to convince them of guilt. In statistics, we say we “fail to reject the null hypothesis.”

We never say “accept the innocent” just like we never say “accept the null.”

2. why is the standard p threshold 0.05? Is it because it’s similar to 95% CI?

This is a very philosophical question! The American Statistical Association barely has the answer!

4. how the tidy () script changes the p-value moving forward

It doesn’t really. It just changes the formating of the results from t.test() so that we can view it differently.

5. I wasn’t clear on how I would choose to do a one-sided t test in R

Remember, you can go into the console in R and type ?t.test to get information on the function.

Here’s what I see when I type that:

t.test(x, y = NULL,
       alternative = c("two.sided", "less", "greater"),
       mu = 0, paired = FALSE, var.equal = FALSE,
       conf.level = 0.95)

If we want to do a one-sided t-test in R, we would use “less” or “greater” for the alternative input.

Fall 2024

1. What is the test statistic? (Is it the same as a t-test?)

The “t-test” is the shorthand that people use to refer to the process of calculating the test statistic and p-value.

The test statistic measures how far our sample mean (\(\overline{x}\)) is from the mean when we assume the null (aka the mean is \(\mu\)).

2. The specific language around and conclusions that can be drawn from p-values

We basically have two options once we have a p-value:

The p-value is less than \(\alpha\), then we reject the null hypothesis.
The p-value is greater than or equal to \(\alpha\), then we fail to reject the null hypothesis.

So let’s get into it:

The p-value is less than \(\alpha\), then we reject the null hypothesis.
- When the p-value is less than \(\alpha\), we’ve crossed into a range of probabilities that are very unlikely.
- Remember, the p-value measures the probability of obtaining a sample mean just as extreme or more extreme than the observed sample mean (\(\overline{x}\)) assuming the null hypothesis is true (that the population mean is \(\mu\)).
- The smaller the p-value, the more evidence that we have that our sample mean (\(\overline{x}\)) is NOT from the distribution with the population mean is \(\mu\)
- There is a cutoff for when we decide that our sample mean (\(\overline{x}\)) is NOT from the distribution with the population mean is \(\mu\). That cutoff is the significance level (\(\alpha\)).
- So when the p-value is less than \(\alpha\), we say we have sufficient evidence that our sample is not from the population distribution. This was our null hypothesis, so we reject the null.
The p-value is greater than or equal to \(\alpha\), then we fail to reject the null hypothesis.
- When the p-value is greater than or equal to \(\alpha\), we are still in the range of probabilities that are likely.
- Remember, the p-value measures the probability of obtaining a sample mean just as extreme or more extreme than the observed sample mean (\(\overline{x}\)) assuming the null hypothesis is true (that the population mean is \(\mu\)).
- When the p-value is pretty large, then it is likely that our sample comes from our assumed null distribution.
- However, this is not what we’re trying to prove in our hypothesis test, so we can’t make any claims about “accepting the null.” Instead, we fail to reject it.

3. Help with the bigger picture of \(H_0\) and \(H_A\). In a real world example, would it only be notable to reject \(H_0\)? If a researcher fails to reject the null, would their findings not be believable?

The idea behind the hypothesis test is usually that the null is the currently accepted value, and that we are presenting a “challenger” to the null. The challenger is only noteworthy if we reject the null and establish a new range of values for the population mean (from the confidence interval).

Thus, if we fail to reject the null, it’s not that we don’t believe our findings, it just means that we are not presenting new information to the world.