TB sections 3.3-3.4
2024-10-16
Discrete random variable
A discrete r.v. \(X\) takes on a finite number of values or countably infinite number of possible values.
Think:
Continuous random variable
A continuous r.v. \(X\) can take on any real value in an interval of values or unions of intervals.
Think:
Two important features of continuous distributions:
The total area under the density curve is 1.
The probability that a variable has a value within a specified interval is the area under the curve over that interval.
When working with continuous random variables, probability is found for intervals of values rather than individual values.
The probability that a continuous r.v. \(X\) takes on any single individual value is 0
Thus, \(P(a < X < b)\) is equivalent to \(P(a \leq X \leq b)\)
A random variable X is modeled with a normal distribution if:
A standard normal distribution is defined as a normal distribution with mean 0 and variance 1. It is often denoted as \(Z \sim N(0, 1)\).
Any normal random variable \(X\) can be transformed into a standard normal random variable \(Z\).
\[Z = \dfrac{X - \mu}{\sigma} \qquad X = \mu + Z\sigma\]
The \(Z\)-score of an observation quantifies how far the observation is from the mean, in units of standard deviation(s).
For example, if an observation has \(Z\)-score \(z = 3.4\), then the observation is 3.4 standard deviations above the mean.
Transformation from general normal \(X\) to standard normal \(Z\)
R commands with their input and output:
R code | What does it return? |
---|---|
rnorm() |
returns sample of random variables with specified normal distribution |
dnorm() |
returns value of probability density at certain point of the normal distribution
|
pnorm() |
returns cumulative probability of getting certain point (or less) of the normal distribution |
qnorm() |
returns z-score corresponding to desired quantile |
Three ways to calculate probabilities from a normal distribution:
Normal probability table
R commands
pnorm(q, mean = 0, sd = 1, lower.tail = TRUE)
Random online calculators
Example: Calculating standard normal probabilities practice
Let \(Z\) be a standard normal random variable, \(Z\sim N(\mu=0,\sigma=1)\). Calculate the following probabilities. Include sketches of the normal curves with the probability areas shaded in.
\(\mathbb{P}( Z < 2.67 )\)
\(\mathbb{P}( Z > -0.37 )\)
\(\mathbb{P}( -2.18 < Z < 2.46 )\)
\(\mathbb{P}(Z = 1.53 )\)
Example: Calculating standard normal probabilities practice
Let \(Z\) be a standard normal random variable, \(Z\sim N(\mu=0,\sigma=1)\). Calculate the following probabilities. Include sketches of the normal curves with the probability areas shaded in.
Example: Calculating standard normal probabilities practice
Let \(Z\) be a standard normal random variable, \(Z\sim N(\mu=0,\sigma=1)\). Calculate the following probabilities. Include sketches of the normal curves with the probability areas shaded in.
Example: Calculating standard normal probabilities practice
Let \(Z\) be a standard normal random variable, \(Z\sim N(\mu=0,\sigma=1)\). Calculate the following probabilities. Include sketches of the normal curves with the probability areas shaded in.
Example: Calculating standard normal probabilities practice
Let \(Z\) be a standard normal random variable, \(Z\sim N(\mu=0,\sigma=1)\). Calculate the following probabilities. Include sketches of the normal curves with the probability areas shaded in.
Example: Diastolic blood pressure (DBP)
Suppose the distribution of diastolic blood pressure (DBP) in 35- to 44-year old men is normally distributed with mean 80 mm Hg and variance 144 mm Hg.
Mild hypertension is when the DBP is between 90 and 99 mm Hg. What proportion of this population has mild hypertension?
What is the \(10^{th}\) percentile of the DBP distribution?
What is the \(95^{th}\) percentile of the DBP distribution?
Example: Diastolic blood pressure (DBP)
Suppose the distribution of diastolic blood pressure (DBP) in 35- to 44-year old men is normally distributed with mean 80 mm Hg and variance 144 mm Hg.
Example: Diastolic blood pressure (DBP)
Suppose the distribution of diastolic blood pressure (DBP) in 35- to 44-year old men is normally distributed with mean 80 mm Hg and variance 144 mm Hg.
Example: Diastolic blood pressure (DBP)
Suppose the distribution of diastolic blood pressure (DBP) in 35- to 44-year old men is normally distributed with mean 80 mm Hg and variance 144 mm Hg.
As \(n\) gets big though, the distribution shape of a binomial r.v. gets more and more symmetric, and can be approximated by a normal distribution
Pretty good video behind the intuition of this (Watch 00:00 - 05:40)
Also known as: Sampling distribution of \(\widehat{p}\)
If \(X\sim \text{Binomial}(n,p)\) and \(np>10\) and \(nq = n(1-p) > 10\)
THEN approximately \[X\sim \text{Normal}\big(\mu_X = np, \sigma_X = \sqrt{np(1-p)} \big)\]
Continuity Correction: Applied to account for the fact that the binomial distribution is discrete, while the normal distribution is continuous
Example: Vaccinated people testing positive for Covid-19 (revisited)
About 25% of people that test positive for Covid-19 are vaccinated for it. Suppose 100 people have tested positive for Covid-19 (independently of each other). Let \(X\) denote the number of people that are vaccinated among the 100 that tested positive. What is the probability that fewer than 20 of the people that tested positive are vaccinated?
Calculate exact probability.
Calculate approximate probability.
Example: Vaccinated people testing positive for Covid-19 (revisited)
About 25% of people that test positive for Covid-19 are vaccinated for it. Suppose 100 people have tested positive for Covid-19 (independently of each other). Let \(X\) denote the number of people that are vaccinated among the 100 that tested positive. What is the probability that fewer than 20 of the people that tested positive are vaccinated?
Calculate exact probability.
Calculate approximate probability.
\(p=0.25\), \(n=100\), we want \(P(X < 20)\)
Approximate probability = Normal distribution
\[X \sim \text{Normal}\big(\mu=25, \sigma = 4.33\big)\]
The Poisson distribution is often used to model count data (# of successes), especially for rare events
Example: historical records of hospitalizations in New York City indicate that an average of 4.4 people are hospitalized each day for an acute myocardial infarction (AMI)
Suppose events occur over time in such a way that
The probability an event occurs in an interval is proportional to the length of the interval.
Events occur independently at a rate \(\lambda\) per unit of time.
Then the probability of exactly \(x\) events in one unit of time is \[ P(X = k) = \frac{e^{-\lambda}\lambda^{k}}{k!}, \,\, k = 0, 1, 2, \ldots \]
For the Poisson distribution modeling the number of events in one unit of time:
The mean is \(\lambda\).
The standard deviation is \(\sqrt{\lambda}\).
Shorthand for a random variable, \(X\), that has a Poisson distribution: \[X \sim \text{Pois}(\lambda)\]
R commands with their input and output:
R code | What does it return? |
---|---|
rpois() |
returns sample of random variables with specified Poisson distribution |
dpois() |
returns value of probability density at certain point of the Poisson distribution |
ppois() |
returns cumulative probability of getting certain point (or less) of the Poisson distribution |
qpois() |
returns number of cases corresponding | to desired quantile |
Typhoid fever
Suppose there are on average 5 deaths per year from typhoid fever over a 1-year period.
What is the probability of 3 deaths in a year?
What is the probability of 2 deaths in 0.5 years?
What is the probability of more than 12 deaths in 2 years?
Typhoid fever
Suppose there are on average 5 deaths per year from typhoid fever over a 1-year period.
\[P(X=3) = \frac{e^{-5}5^{3}}{3!} = 0.1404\]
Typhoid fever
Suppose there are on average 5 deaths per year from typhoid fever over a 1-year period.
\(\lambda = ?\) and we want \(P(X = 2)\)
\(\lambda=5\) was the rate for one year. When we want the rate for half year, we need to calculate a new \(\lambda\):
\[P(X=2) = \frac{e^{-2.5}2.5^{2}}{2!} = 0.0.2565\]
Typhoid fever
Suppose there are on average 5 deaths per year from typhoid fever over a 1-year period.
\(\lambda = ?\) and we want \(P(X > 12)\)
\[P(X>12) = 1 - P(X \leq 12) = 1 - \sum_{k=0}^{12}\frac{e^{-10}10^{k}}{k!} = 0.2084\]
Lesson 6 Slides