Lesson 4: Measurements of Association and Agreement
Learning Objectives
- Identify cases when it is appropriate to use risk difference, relative risk, or odds ratios
- Expand work on contingency tables to evaluate the agreement or reproducibility using Cohen’s Kappa
Last class
Used contingency tables to test and measure association between two variables
- Categorical outcome variable (Y)
- One categorical explanatory variable (X)
We looked at risk difference, risk ratio, and odds ratio to measure association
Measure | Estimate |
Risk difference | \[\widehat{RD} = \widehat{p}_1 - \widehat{p}_1 = \dfrac{n_{11}}{n_1} - \dfrac{n_{21}}{n_2}\] |
Relative risk / risk ratio | \[\widehat{RR}=\dfrac{\hat{p}_1}{\hat{p}_2} = \dfrac{n_{11}/n_1}{n_{21}/n_2}\] |
Odds ratio | \[\widehat{OR}=\frac{odds_1}{odds_2}=\frac{{\hat{p}}_1/(1-{\hat{p}}_1)}{{\hat{p}}_2/(1-{\hat{p}}_2)}\] |
- Discussed how OR will be an important measurement in logistic regression
A taste of regression for a binary outcome (we will come back to this!!)
= glm(case ~ glucimp, data = SHS, family = binomial) logreg
glm(formula = case ~ glucimp, family = binomial, data = SHS)
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.52287 0.08969 -5.83 5.55e-09 ***
glucimpNormal -1.53684 0.12982 -11.84 < 2e-16 ***
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1646.3 on 1663 degrees of freedom
Residual deviance: 1501.3 on 1662 degrees of freedom
AIC: 1505.3
Number of Fisher Scoring iterations: 4
Logistic regression predicting case
OR(95%CI) P(Wald's test) P(LR-test)
glucimp (cont. var.) 0.22 (0.17,0.28) < 0.001 < 0.001
Log-likelihood = -750.6533
No. of observations = 1664
AIC value = 1505.3066
Relationship Between RR and OR (1/2)
- Notice that odds ratio is not equivalent to relative risk (or risk ratio)
However, when the probability of “success” is small (e.g., rare disease), \(\widehat{OR}\) is a nice approximation of \(\widehat{RR}\) \[\widehat{OR}=\frac{{\hat{p}}_1/(1-{\hat{p}}_1)}{{\hat{p}}_2/(1-{\hat{p}}_2)}=\widehat{RR}\cdot \frac{1-\widehat{p_2}}{1-\widehat{p_1}}\]
- The fraction in the last term of the above expression approximately equals to 1.0 if \(\widehat{p}_1\) and \(\widehat{p}_2\) BOTH quite small (< 0.1)
The \(\widehat{OR}\) and \(\widehat{RR}\) are not very close to each other in SHS: diabetes not a rare disease
- \(\widehat{OR} = 4.65\)
- \(\widehat{RR} = 3.29\)
Relationship Between RR and OR (2/2)
An example where a disease rare over the whole sample (~1%), but …
- \(\widehat{OR}\) is not a good estimate of \(\widehat{RR}\) in “rare” disease
- \(\widehat{p}_1\) is 0.5: thus \(\widehat{OR}\) and \(\widehat{RR}\) are very different
\[\widehat{RR}=\frac{0.5}{0.00102}=490 \text{ and } \widehat{OR} = \frac{0.5(1-0.5)}{0.00102(1-0.00102)}=981\]
RR in retrospective case-control study (1/3)
In retrospective case-control studies: we identify cases (patients with the outcome), then select a number of controls (patients without the outcome)
Case-control study to require much smaller sample size than equivalent cohort studies
So we pick out the cases and controls first, then see if there is exposure
However, the proportion of cases in the sample does not represent the proportion of cases in the population
- RR compares probability of the outcome (case) for exposed and unexposed groups
- Number of outcomes has been artificially inflated for case-control study
RR in retrospective case-control study (2/3)
- Assume a 1:2 case-control study summarized in below table:
- Assume we compute the RR as if it is from a cohort study:
RR in retrospective case-control study (3/3)
- In real world, the proportion of controls (not diseased) is typically much higher. Assume the table below shows the proportion in the population in a cohort study
- The estimated RR for the patient population is:
Notes for Odds Ratios
The OR is valid for
- Case-control studies (where the RR is not appropriate)
- Prospective cohort studies
- Cross-sectional studies
It can be interpreted either as…
- Odds of event for exposed vs. unexposed individuals, or
- Odds of exposure for individuals with vs. without the event of interest
- Pay attention to the numerator and denominator for the OR
OR in retrospective case-control study
While we cannot estimate RR from a case-control study, we can still estimate OR for case-control study
OR does not require us to distinguish between the outcome variable and explanatory variable in the contingency table
- AKA: Odds ratio of disease comparing exposed to not exposed is same as odds ratio of being exposed comparing diseased and not diseased
For case-control study where the probability of having outcome is small, the \(\widehat{OR}\) is a nice approximation to \(\widehat{RR}\)
For the 1:2 case-control table: \(\widehat{OR}=\frac{40\cdot160}{40\cdot60} = 2.667\)
Population cohort study: \(\widehat{RR}=2.5152\)
Which measurement should one use?
Measuring Agreement
Still within the realm of contingency tables
What if we are NOT looking at the association between two variables?
What if we want to look at the agreement between two things?
- Answers of same subjects for same survey taken at different times
- Two different radiologists’ assessment of the same X-ray
Cohen’s Kappa statistics: widely used as a measure of agreement
- Example: Reliability studies, interobserver agreement
Let’s get our mood data down!
Measuring Agreement
If perfect agreement among the two raters/surveys:
- We would expect nonzero entries only in the diagonal cells of the table
\(p_o\) is the observed proportion of complete agreement (concordance)
\(p_E\) is the expected proportion of complete agreement if the agreement is just due to chance
If the \(p_o\) is much greater than \(p_E\), then the agreement level is high.
- Otherwise, the agreement level is low
- Cohen’s Kappa is based on the difference between \(p_o\) and \(p_E\): \[\hat{\kappa}=\frac{p_o-p_E}{1-p_E}\]
- \(\hat{\kappa} = 0\): No agreement between surveys/raters other than what would be expected by chance
- \(\hat{\kappa} = 1\): Complete agreement
Measuring Agreement: Cohen’s Kappa
Point estimate: \[\hat{\kappa}=\frac{p_o-p_E}{1-p_E}\]
- With \(p_o=\ \frac{\sum_{i}\ n_{ii}}{n}\) (sum of diagonals divided by total)
- With \(p_E=\sum_{i}{a_ib_i}\)
- With range of point estimate from \([-1, 1]\)
What’s \(\sum_{i}{a_ib_i}\)?
For \(i\) responses (row/columns), \(a_i\) is proportion of \(i\) response category in first survey and \(b_i\) is proportion of \(i\) response category in second survey (we’ll show this in the example)
- Approximate standard error:
\[ SE_{\widehat{\kappa}} = \sqrt{\frac{1}{{n\left(1-p_e\right)}^2}\left\{p_e^2+p_e-\sum_{i}\left[a_ib_i\left(a_i+b_i\right)\right]\right\}}\]
- 95% Wald confidence interval for \(\widehat{\kappa}\):
\[\widehat{\kappa} \pm 1.96 \cdot SE_{\widehat{\kappa}}\]
Example: Our moods (1/3)
Agreement of surveys
Compute the point estimate and 95% confidence interval for the agreement between our Monday and Wednesday moods.
Needed steps:
- Compute the kappa statistic
- Find confidence interval of kappa
- Interpret the estimate
Example: Our moods (2/3)
Agreement of surveys
Compute the point estimate and 95% confidence interval for the agreement between our Monday and Wednesday moods.
Needed steps:
1/2. Compute the kappa statistic and find confidence interval of kappa
Package epiR 2.0.63 is loaded
Type help(epi.about) for summary information
Type browseVignettes(package = 'epiR') to learn how to use epiR for applied epidemiological analyses
= matrix(c(100, 40, 10, 30), nrow = 2, byrow = T)
moods moods
[,1] [,2]
[1,] 100 40
[2,] 10 30
epi.kappa(moods, method = "cohen")$kappa
est se lower upper
1 0.3661972 0.07617362 0.2168996 0.5154947
Example: Our moods (3/3)
Agreement of surveys
Compute the point estimate and 95% confidence interval for the agreement between our Monday and Wednesday moods.
Needed steps:
- Interpret the estimate
The kappa statistic is ____ (95% CI: _____, _____), indicating ______ agreement.
Since the 95% confidence interval does/does not contain 0, we have/do not have sufficient evidence that there is _________ agreement between our mood on Monday and our mood on Wednesday.
Measuring Agreement: Oberved Kappas
Guidelines for evaluating Kappa (Rosner TB)
- Excellent agreement if \(\hat\kappa \geq 0.75\)
- Fair to good agreement if \(0.4 < \hat\kappa < 0.75\)
- Poor agreement if \(\hat\kappa \leq 0.4\)
If \(\hat\kappa<0\), suggest agreement less than by chance
Measurement of Association So Far
Used contingency tables to test and measure association between two variables
- Categorical outcome variable (Y)
- One categorical explanatory variable (X)
We looked at risk difference, risk ratio, and odds ratio to measure association
Such an association is called crude association
- No adjustment for possible confounding factors
- Also called marginal association
But we cannot expand analysis based on contingency tables past 3 variables
- We can get into stratified contingency tables to bring in a 3rd variable
- But I don’t think it’s worth it because regression can bring in (adjust for) many variables
Extra example in case the mood example fails beautifully
Just in case our data doesn’t work out: Beef Consumption in Survey
A diet questionnaire was mailed to 537 female American nurses on two separate occasions several months apart. The questions asked included the quantities eaten of more than 100 separate food items. The data from the two surveys for the amount of beef consumption are presented in the below table. How can reproducibility of response for the beef-consumption data be quantified?
Example: Beef Consumption in Survey (1/3)
Agreement of surveys
Compute the point estimate and 95% confidence interval for the agreement between beef consumption surveys. Similar to question: Are results reproducible for the beef-consumption in the survey?
Needed steps:
- Compute the kappa statistic
- Find confidence interval of kappa
- Interpret the estimate
Example: Beef Consumption in Survey (2/3)
Agreement of surveys
Compute the point estimate and 95% confidence interval for the agreement between beef consumption surveys. Similar to question: Are results reproducible for the beef-consumption in the survey?
Needed steps:
1/2. Compute the kappa statistic and find confidence interval of kappa
= matrix(c(136, 92, 69, 240), nrow = 2, byrow = T)
beef epi.kappa(beef, method = "cohen")$kappa
est se lower upper
1 0.3781906 0.04100635 0.2978196 0.4585616
Example: Beef Consumption in Survey (3/3)
Agreement of surveys
Compute the point estimate and 95% confidence interval for the agreement between beef consumption surveys. Similar to question: Are results reproducible for the beef-consumption in the survey?
Needed steps:
- Interpret the estimate
The kappa statistic is 0.378 (95% CI: 0.298, 0.459), indicating fair agreement.
Since the 95% confidence interval does not contain 0, we have sufficient evidence that there is fair agreement between the surveys for beef consumption. The survey is not reliably reproducible since we did not achieve excellent agreement.