Muddy Points

Published

October 14, 2024

Modified

October 16, 2024

1. Application of expected values and variance for discrete RVs

Expected value (mean) and variance, as we saw in this lesson about random variables, is the same as what we saw in Lesson 2 on numeric summaries! These are really good tools to calculate the center and spread of a distribution. In the case of a random variable with a Binomial distribution, this will tell us the expected number of “successes” and how spread out the number of successes are.

Let’s use some simulations to show this! The probability of success for all of these random variables is 0.5. The top part of the graphs shows \(n\), the number of trials. The centers and spreads for each of these random variables (all Binomally distributed) are different!

Code that I used to create these plots of binomial RVs

library(tidyverse)

x <- -5:250
n = c(6,14,30,60,100)
p = 0.5

binom = data.frame(x=rep(x, length(n)), 
                   y=dbinom(x, rep(n, each=length(x)), p),
                   n=rep(n, each=length(x)))

ggplot(binom %>% filter(y > 1e-5) %>% 
         group_by(n), 
       aes(x, y, colour=factor(n))) +
  geom_point(size=0.6) +
  geom_segment(aes(x=x, xend=x, y=0, yend=y, colour=factor(n)), lwd=0.8, alpha=0.5) +
  facet_grid(. ~ n, scales="free", space="free") +
  theme(legend.position = "none",
        axis.title = element_text(size = 14),    # Axis title size
        axis.text = element_text(size = 12),     # Axis text size
        strip.text = element_text(size = 13)) +  # Facet label size
  labs(x = "Number of successes", y = "Probability")

2. Use of these distributions

When it comes to important distributions in statistics, we want to get familiar with the the characteristics and calculations because it is an important step in the data analysis process.

Here is the basic data analysis process:

Recognize type of data we’re dealing with (continuous, discrete, categorical, ordinal, count?)
Determine the key characteristics of the data. What is it measuring? For example, count data can be out of a certain number of trials (binomial) or has no limit to the count (Poisson)
Identify the distribution of the data so we can easily describe/summarize the data (using known properties of the distribution)
Use specific tools that are designed specifically for that distribution to analyze the data (aka answer a specific research question)

3. Binomial distribution

Eek! I’ve ran out of time to organize and really get into this explanation. Instead, I’m going to drop a few links below. Keep Googling for other videos on Bernoulli and Binomial distributions! There are a lot out there, so let me know if one really resonates with you.

Here’s some back and forth between me and Chatgpt: https://chatgpt.com/share/670eb8e4-7644-8006-8eb9-c62348ecdef6
This video explain the Binomial dsitribution pretty well: https://www.youtube.com/watch?v=J8jNoF-K8E8&ab_channel=StatQuestwithJoshStarmer
- Watch up until ~10 minutes. After that, it goes into a test beyond us right now!
Here’s a video on Binomial distribution with an example and R calculations: https://www.youtube.com/watch?v=j_T66Cph7Ro&ab_channel=EquitableEquations

4. Variance of a discrete random variable

I wouldn’t get too caught up in the formula here. The important part of variance is that we measure the spread (as seen in Muddy Point 1) of a distribution or data.