HW 2 and Lab 2

Nicky Wakim

2025-02-12

Homework 2

A small word on the homework

  • Mostly good work!
  • Main note: Please look at the solutions to make sure you have the correct beta’s and interpretations when we work with categorical variables!!

Lab 2

A note from me

  • I know the lab instructions are wordy
  • This class is really about the technical (“objective”) skills of regression
    • But in order to responsibly practice statistics, you need to critically think about the subjective choices you make
  • And I’m really trying to lay out my thought process in the labs so that you have some idea of the subjective choices that I’m kinda restricting us to
    • And that’s really just bc you’re learning A LOT in this class
    • So taking on extra learning objectives would be overwhelming

Other overall issues

  • No need to load the codebook into R!!!
    • Codebooks are typically opened in excel and will give you extra information on the variables
  • You gotta show all your code!
    • If you got points off for not showing any code, resubmit with the code showing and I’ll give you credit back
  • Be careful when making assumptions about the data
    • Example: someone created a cisgender variable by seeing if SAB was the same as gender identity
      • I would be wary of that - definitions of cis and trans are highly personal - only use and refer to participants as they self-identify
  • Do not immediately make age categories!! It is important to look at age (numeric) vs. IAT
    • Why pixelate your data?? We only do it if we need to (aka age as numeric is NOT linear with IAT score)

3.1: What is our target population?

  • This is an important thing to flag as you analyze your results and interpret them for an audience

 

  • We restricted our population to the US
  • Harvard says the test is only for individuals 18+ years old
  • Test takers need access to the internet and a computer (or phone?)

 

  • Another thought
    • Sometimes your target population defines your sample
    • Other times your sample defines your target population
  • Here we have a convenience sample, with specific restrictions and accessibility
    • That means the population that we can generalize to is limited to those restrictions and accessibility!!
  • We need to discuss these limitations when we present these results to the world!

3.2 Restrict your analysis to 1 outcome and 9 possible covariates/predictors

Needed to pick the variable from your research question + 2 others (or 3 if you chose a different variable in your research question)

  1. Explicit anti-fat bias (att7)
  2. Self-perception of weight (iam_001)
  3. Fat group identity (identfat_001 )
  4. Thin group identity (identthen_001 )
  5. Controllability of weight of others (controlother_001)
  6. Controllability of weight of yourself (controlyou_001)
  7. Awareness of societal standards (mostpref_001 )
  8. Internalization of societal standards (important_001)

Needed to include all 4 demographic variables:

  1. Age (we need to construct from birthmonth, birthyear, testmonth, and testyear)
  2. Race (raceomb_002 or raceombmulti)
  3. Ethnicity (ethnicityomb)
  4. Sex assigned at birth (birthSex)

Please pick only 2 additional variables:

  1. Education (edu_14)
  2. Gender (genderIdentity)
  3. Self-reported BMI (through self-reported height and weight)
  4. Political identity
  5. Religion

3.2 Restrict your analysis to 1 outcome and 9 possible covariates/predictors

  • Start by loading the data
load(file = here("../Project/data/iat_data.rda"))
iat_2021 = iat_2021 %>% 
  select(IAT_score = D_biep.Thin_Good_all, 
         att7, iam_001, identfat_001, 
         myweight_002, myheight_002,
         identthin_001, controlother_001, 
         controlyou_001, mostpref_001,
         important_001, 
         birthmonth, birthyear, month, year, 
         raceomb_002, raceombmulti, ethnicityomb, 
         edu, edu_14, 
         genderIdentity, 
         birthSex) %>%
  drop_na()

3.3: Manipulating variables that are coded as numeric variables

  • No need to make plots here (that was just part of my example)
    • Plots and tables are a good way to check that you accomplished the correct translation
  • Giving the levels order:
iat_2021 = iat_2021 %>% mutate(iam_001_f = case_match(iam_001,
                                  7 ~ "Very overweight",
                                  6 ~ "Moderately overweight",
                                  5 ~ "Slightly overweight",
                                  4 ~ "Neither underweight nor underweight",
                                  3 ~ "Slightly underweight",
                                  2 ~ "Moderately underweight",
                                  1 ~ "Very underweight",
                                  .default = NA) %>% 
             factor(levels = c("Very underweight", # Assigns the level order!
                               "Moderately underweight", 
                               "Slightly underweight", 
                               "Neither underweight nor underweight", 
                               "Slightly overweight", 
                               "Moderately overweight", 
                               "Very overweight")))

3.3: Manipulating variables that are coded as numeric variables

  • Now when we print a table, we can see them in order
iat_2021 %>%
  dplyr::select(iam_001_f) %>%
  tbl_summary()
Characteristic N = 242,7621
iam_001_f
    Very underweight 1,341 (0.6%)
    Moderately underweight 5,436 (2.2%)
    Slightly underweight 17,224 (7.1%)
    Neither underweight nor underweight 106,836 (44%)
    Slightly overweight 65,418 (27%)
    Moderately overweight 32,259 (13%)
    Very overweight 14,248 (5.9%)
1 n (%)

3.5 If you chose BMI, create the variable

  • If you worked with BMI, please make sure you followed the help page!
  • Please come double check with me that you are creating it correctly!

4.3 Bivariate exploratory data analysis

  • You only needed to create one plot!!
  • My research question: Is self-perception of weight associated with IAT score?
How I made the plot
ggplot(iat_2021, aes(x = iam_001_f, y = IAT_score))+
  geom_boxplot()+
   labs(x = "Self-perception of weight", 
       y = "IAT Score", 
       title = "IAT Score by self-perception of weight") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5), 
        axis.text.x = element_text(angle = 30, vjust = 1, hjust=1))

4.3 Bivariate exploratory data analysis

  • You only needed to create one plot!!
  • My research question: Is self-perception of weight associated with IAT score?
How I made the plot
ggplot(iat_2021, aes(x = IAT_score, y = iam_001_f))+
  geom_boxplot()+
   labs(y = "Self-perception of weight", 
       x = "IAT Score", 
       title = "IAT Score by self-perception of weight") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))

4.3 Bivariate exploratory data analysis

  • You only needed to create one plot!!
  • My research question: Is self-perception of weight associated with IAT score?
How I made the plot
ggplot(iat_2021, aes(x = IAT_score, color = iam_001_f))+
  geom_density() +
   labs(x = "IAT Score", 
       title = "IAT Score by self-perception of weight") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))

4.3 Bivariate exploratory data analysis

  • You only needed to create one plot!!
  • My research question: Is self-perception of weight associated with IAT score?
How I made the plot
library(ggridges)
ggplot(iat_2021, aes(x = IAT_score, y = iam_001_f))+
  geom_density_ridges(alpha = 0.3, 
          show.legend = FALSE) +  
   labs(y = "Self-perception of weight", 
       x = "IAT Score", 
       title = "IAT Score by self-perception of weight") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))

Multi-selection/multi-response variables

Multi-response/multi-selection variables: raceombmulti and genderIdentity

4 Approaches to Multiple-Race Questions from We All Count

  • This method works for any multi-level variable

Final notes

  • For now, I suggest the binary approach!
    • This is the perfect level of pushing ourselves coding wise and thinking critically about these multi-response variables
  • Take a look at this article: https://doi.org/10.1016/j.socscimed.2017.12.026
    • It gets into some of the considerations and uses of intersectionality in analyses