Lesson 4: Conditional Probability

TB sections 2.2

Author

Meike Niederhausen and Nicky Wakim

Published

October 9, 2024

Learning Objectives

  1. Recognize joint, marginal, and conditional probabilities in contingency and probability tables

  2. Mathematically define probability properties that relate to conditional probability (general multiplication rule, independence and conditional probability, and Bayes’ theorem)

  3. Apply probability properties to solve a world problem on positive predictive value (PPV)

Where are we?

Learning Objectives

  1. Recognize joint, marginal, and conditional probabilities in contingency and probability tables
  1. Mathematically define probability properties that relate to conditional probability (general multiplication rule, independence and conditional probability, and Bayes’ theorem)

  2. Apply probability properties to solve a world problem on positive predictive value (PPV)

Example: hypertension prevalence (1/2)

  • US CDC estimated that between 2011 and 20141, 29% of the population in America had hypertension

 

  • A health care practitioner seeing a new patient would expect a 29% chance that the patient might have hypertension
    • However, this is only the case if nothing else is known about the patient

Example: hypertension prevalence (2/2)

  • Prevalence of hypertension varies significantly with age
    • Among adults aged 18-39, 7.3% have hypertension
    • Adults aged 40-59, 32.2%
    • Adults aged 60 or older, 64.9% have hypertension

 

  • Knowing the age of a patient provides important information about the likelihood of hypertension

    • Age and hypertension status are not independent (we will get into this)
  • While the probability of hypertension of a randomly chosen adult is 0.29…

    • The conditional probability of hypertension in a person known to be 60 or older is 0.649

 

How can we assemble the full picture of hypertension and age with probabilities?

Contingency tables

  • We can start looking at the contingency table for hypertension for different age groups
    • Contingency table: type of data table that displays the frequency distribution of two or more categorical variables
Table: Contingency table showing hypertension status and age group, in thousands.
Age Group Hypertension No Hypertension Total
18-39 years 8836 112206 121042
40 to 59 years 42109 88663 130772
Greater than 60 years 39917 21589 61506
Total 90862 222458 313320

Types of probabilities from contingency tables

Table: Contingency table showing hypertension status and age group, in thousands.
Age Group Hypertension No Hypertension Total
18-39 years 8836 112206 121042
40 to 59 years 42109 88663 130772
Greater than 60 years 39917 21589 61506
Total 90862 222458 313320
  • Joint probability

    • In first row, shows that in the entire population of 313,320,000, approximately 8,836,000 people were aged 18-39 years and had hypertension (~2.8%)
  • Marginal probability

    • We can say that in the entire population of 313,320,000, approximately 121,042,000 people are 18-39 years (~38.6%)
  • Conditional probability

    • But we can also say the first row shows that of 121,042,000 people who are 18-39 years, 8,836,000 people had hypertension (~7.3%)

Poll Everywhere Question 1

Probability tables

We typically display joint and marginal probabilities in probability table

 

Table: Probability table summarizing hypertension status and age group.
Age Group Hypertension No Hypertension Total
18-39 years 0.0282 0.3581 0.3863
40 to 59 years 0.1344 0.2830 0.4174
Greater than 60 years 0.1274 0.0689 0.1963
Total 0.2900 0.7100 1.0000
  • Joint probability: intersection of row and column

  • Marginal probability: row or column total

Let’s go back to conditional probability

  • So far we have intuitively thought of conditional probability and used the contingency table:
    • The first row shows that of 121,042,000 people who are 18-39 years, 8,836,000 people had hypertension (~7.3%)
  • We got this from: \[P(\text{hypertension} | \text{18-39 years old}) = \dfrac{8,836,000}{121,042,000} = 0.073\]
    • \(\text{hypertension} | \text{18-39 years old}\)” reads as “hypertension given 18-39 years old”

 

Can we calculate the conditional probability from the probability table?

Learning Objectives

  1. Recognize joint, marginal, and conditional probabilities in contingency and probability tables
  1. Mathematically define probability properties that relate to conditional probability (general multiplication rule, independence and conditional probability, and Bayes’ theorem)
  1. Apply probability properties to solve a world problem on positive predictive value (PPV)

We can define conditional probability more mathematically

  • Let’s define some events:
    • \(A\) = hypertension
    • \(B\) = 18-39 years old

\[P(\text{hypertension} | \text{18-39 years old}) = P(A | B) = \dfrac{P(A \cap B)}{P(B)}\]

Conditional probability

The conditional probability of an event A given an event or condition B is: \[P(A|B) = \frac{P(A \cap B)}{P(B)}\]

So if we had a table of probabilities for our example…

Table: Probability table summarizing hypertension status and age group.
Age Group Hypertension No Hypertension Total
18-39 years 0.0282 0.3581 0.3863
40 to 59 years 0.1344 0.2830 0.4174
Greater than 60 years 0.1274 0.0689 0.1963
Total 0.2900 0.7100 1.0000

 

What is the probability of hypertension for someone aged 18-39 years old?

Recall

  • \(A\) = hypertension
  • \(B\) = 18-39 years old
  • \[P(A|B) = \frac{P(A \cap B)}{P(B)}\]

General multiplication rule

General multiplication rule

If \(A\) and \(B\) represent two outcomes or events, then \[P(A \cap B) = P(A|B)P(B)\]

This follows from rearranging the definition of conditional probability: \[P(A|B) = \frac{P(A \cap B)}{P(B)} \rightarrow P(A|B)P(B) = P(A \cap B)\]

Independence and conditional probability

  • If two events, say A and B, are independent, then: \[P(A \cap B) = P(A)P(B)\]

  • We can extend this to conditional probability: \(P(A|B) = \frac{P(A \cap B)}{P(B)}\)

    • For two independent events, say A and B, \[P(A|B) = \frac{P(A \cap B)}{P(B)} = \frac{P(A)P(B)}{P(B)} = P(A)\]

Conditional probability of independent events

If events A and B are independent, then \[P(A|B) =P(A) \text{ and } P(B|A) = P(B)\]

Poll Everywhere Question 2

Bayes’ Theorem (Section 2.2.5)

Bayes’ Theorem

In its simplest form: \[P(A|B) = \frac{P(B|A)P(A)}{P(B)}\]

This also translates to: \[P(A | B) = \frac{P(B|A) \cdot P(A)} {P(B|A) \cdot P(A) + P(B|A^c) \cdot P(A^c) }\] because of the Law of Total Probability: \[\begin{aligned}P(B) = & P(B \cap A) + P(B \cap A^C) \\ = &P(B | A)P(A)+ P(B|A^C)P(A^C) \end{aligned}\]

Learning Objectives

  1. Recognize joint, marginal, and conditional probabilities in contingency and probability tables

  2. Mathematically define probability properties that relate to conditional probability (general multiplication rule, independence and conditional probability, and Bayes’ theorem)

  1. Apply probability properties to solve a world problem on positive predictive value (PPV)

Example: How accurate is rapid testing for COVID-19? (1/n)

How accurate is rapid testing for COVID-19?

“Based on the results of a clinical study where the iHealth® COVID-19 Antigen Rapid Test was compared to an FDA authorized molecular SARS-CoV-2 test, iHealth® COVID-19 Antigen Rapid Test correctly identified 94.3% of positive specimens and 98.1% of negative specimens.” In October 2022, 83.8 people per 100k in Multnomah County with Covid-19.

Suppose you take the iHealth® rapid test.

  1. What is the probability of a positive test result?

  2. What is the probability of having COVID-19 if you get a positive test result?

  3. What is the probability of not having COVID-19 if you get a negative test result?

From the iHealth® website https://ihealthlabs.com/pages/ihealth-covid-19-antigen-rapid-test-details:

Some specialized terminology in diagnostic tests

Calculating probabilities for diagnostic tests is done so often in medicine that the topic has some specialized terminology

  • The sensitivity of a test is the probability of a positive test result when disease is present, such as a positive mammogram when a patient has breast cancer.
  • The specificity of a test is the probability of a negative test result when disease is absent
  • The probability of disease in a population is referred to as the prevalence.
  • With specificity and sensitivity information for a particular test, along with disease prevalence, the positive predictive value (PPV) can be calculated: the probability that disease is present when a test result is positive.
  • Similarly, the negative predictive value is the probability that disease is absent when test results are negative

Poll Everywhere Question 3

General steps for probability word problems

  1. Define the events in the problem and draw a Venn Diagram

  2. Translate the words and numbers into probability statements

  3. Translate the question into a probability statement

  4. Think about the various definitions and rules of probabilities. Is there a way to define our question’s probability statement (in step 3) using the probability statements with assigned values (in step 2)?

  5. Plug in the given numbers to calculate the answer!

Let’s apply the steps to our example (1/7)

How accurate is rapid testing for COVID-19?

“Based on the results of a clinical study where the iHealth® COVID-19 Antigen Rapid Test was compared to an FDA authorized molecular SARS-CoV-2 test, iHealth® COVID-19 Antigen Rapid Test correctly identified 94.3% of positive specimens and 98.1% of negative specimens.” In October 2022, 83.8 people per 100k in Multnomah County with Covid-19.

Step 1: Let’s define our events of interest

  • \(D\) = event one has disease (COVID-19)

  • \(D^c\) = event one does not have disease

  • \(T^+\) = event one tests positive for disease

  • \(T^-\) = event one tests negative for disease

Let’s apply the steps to our example (2/7)

How accurate is rapid testing for COVID-19?

“Based on the results of a clinical study where the iHealth® COVID-19 Antigen Rapid Test was compared to an FDA authorized molecular SARS-CoV-2 test, iHealth® COVID-19 Antigen Rapid Test correctly identified 94.3% of positive specimens and 98.1% of negative specimens.” In October 2022, 83.8 people per 100k in Multnomah County with Covid-19.

Step 2: Translate given information into mathematical notation

  • Test correctly gives a positive result 94.3% of the time:

 

  • Test correctly gives a negative result 98.1% of the time:

 

  • 83.8 people per 100k in Multnomah County with Covid-19:

Let’s apply the steps to our example (3/7)

How accurate is rapid testing for COVID-19?

“Based on the results of a clinical study where the iHealth® COVID-19 Antigen Rapid Test was compared to an FDA authorized molecular SARS-CoV-2 test, iHealth® COVID-19 Antigen Rapid Test correctly identified 94.3% of positive specimens and 98.1% of negative specimens.” In October 2022, 83.8 people per 100k in Multnomah County with Covid-19.

Step 3: Translate the question into a probability statement

  1. What is the probability of a positive test result?

 

  1. What is the probability of having COVID-19 if you get a positive test result?

 

  1. What is the probability of not having COVID-19 if you get a negative test result?

Let’s apply the steps to our example (4/7)

How accurate is rapid testing for COVID-19?

“Based on the results of a clinical study where the iHealth® COVID-19 Antigen Rapid Test was compared to an FDA authorized molecular SARS-CoV-2 test, iHealth® COVID-19 Antigen Rapid Test correctly identified 94.3% of positive specimens and 98.1% of negative specimens.” In October 2022, 83.8 people per 100k in Multnomah County with Covid-19.

Step 4: Define our question’s probability statement using the probability statements with assigned values

  1. \(P(T^{+}) =\)

Let’s apply the steps to our example (5/7)

How accurate is rapid testing for COVID-19?

“Based on the results of a clinical study where the iHealth® COVID-19 Antigen Rapid Test was compared to an FDA authorized molecular SARS-CoV-2 test, iHealth® COVID-19 Antigen Rapid Test correctly identified 94.3% of positive specimens and 98.1% of negative specimens.” In October 2022, 83.8 people per 100k in Multnomah County with Covid-19.

Step 4: Define our question’s probability statement using the probability statements with assigned values

  1. \(P(D|T^{+}) =\)

Let’s apply the steps to our example (6/7)

How accurate is rapid testing for COVID-19?

“Based on the results of a clinical study where the iHealth® COVID-19 Antigen Rapid Test was compared to an FDA authorized molecular SARS-CoV-2 test, iHealth® COVID-19 Antigen Rapid Test correctly identified 94.3% of positive specimens and 98.1% of negative specimens.” In October 2022, 83.8 people per 100k in Multnomah County with Covid-19.

Step 4: Define our question’s probability statement using the probability statements with assigned values

  1. \(P(D^{\text{c}}|T^{-}) =\)

Let’s apply the steps to our example (7/7)

How accurate is rapid testing for COVID-19?

“Based on the results of a clinical study where the iHealth® COVID-19 Antigen Rapid Test was compared to an FDA authorized molecular SARS-CoV-2 test, iHealth® COVID-19 Antigen Rapid Test correctly identified 94.3% of positive specimens and 98.1% of negative specimens.” In October 2022, 83.8 people per 100k in Multnomah County with Covid-19.

Step 5: Calculate answer