library(GGally)
ggpair1 = iat_2024_comp_factor %>% select(iat_score, id_thin_f, important_f, pol_id_f, exp_af_f) %>% ggpairs()
ggpair1 + theme(axis.text.x = element_text(angle = 45, hjust = 1))Lab 3 Instructions
BSTA 512/612
This lab is ready ! Nicky (2/5/2026)
- There is an instructions file and a file for you to edit and turn in. Please only work in the file linked below!!
1 Directions
Please turn in your .html file on Sakai. Please let me know if you greatly prefer to submit a physical copy.
You can download the .qmd file for this lab here. Please use the linked qmd file and not this one! (This is specifically the instructions.)
The rest of this lab’s instructions are embedded into the lab activities.
1.1 Purpose
The purpose of this lab is to fit a multiple logistic regression model and practice how we would interpret our results for this study.
1.2 Grading
This lab is graded out of 18 points. Each lab will follow the specific rubric on the Project page.
2 Lab activities
If you did not save your dataset in Lab 2: Before starting this lab, you should go back to Lab 2, save a new .rda file that contains all the new variables from that Lab. Then you can load it here!
I have left it up to you to load the needed packages for this lab.
2.1 Restate research question
Please restate your research question from Lab 1.
2.2 Thinking about potential confounders and effect modifiers
Before we explore more of the data, I want us to take a second to think through potential confounders and effect modifiers from the covariates that we selected in Lab 1. I want you to consider how each could alter the relationship between IAT score and your variable of interest (from your research question). For each covariate, explain how it might or might not change the relationship. For example, if our variable of interest is fat group identity, then we may consider that self-perception of size is a confounder since it could be linked with fat group identity and potentially be associated with IAT score.
The purpose of this section is to make sure we are thinking about the relationships between variables in our analysis. I do not want us to make any decisions based solely on the data. I want any changes or manipulations in our variables to be motivated by research-backed evidence.
Finally, for this project, we are most interested in the relationship we identified in our research question! Other variables are supporting this question, and improving that model fit, so that we get as closer to the true relationship in our research question as possible!
For each variable, consider how each could alter the relationship between IAT score and your variable of interest (from your research question). For each covariate, explain how it might or might not change the relationship. You can add to the table below, write in bullets, or however you prefer to organize the information.
Note: you can consider the race/ethnicity variables as a single “variable” since we are trying to capture the same demogrpahic measurement (race/ethnicity).
| Variable name | Confounder, Effect modifier, or nothing? | Reasoning (1-2 sentences) |
|---|---|---|
2.3 Continuing data exploration
In this section, we are going to further explore the variables that we might be adjusting for in our model (potential covariates outside the variable or interest in our research question).
2.3.1 Bivariate exploration
We want to look at all other relationships between IAT score and each covariate (outside of the research question variable). Note, these plots will not be included in your final project submission so you do not need to make them presentable. This is just for you to get a sense of the data.
You can use a function called ggpairs() from the GGally package to make a matrix of plots. If you have trouble seeing or interpreting the individual plots, try selecting a subset of covariates or recreating them in ggplot(). Here is an example of how I would use ggpairs() for some of the variables in my dataset:
Notice that some of my variables are ordered nicely and some are not. For example, thin identity (id_thin_f) should be releveled so that the relationship can be interpreted visually.
As you make and look at the plots, there are a few questions that I want you to consider:
For categorical variables, is there an inherent order? Does the ordered values follow an approximately linear relationship? Are the categories “evenly spaced”? For example, education categories are not necessarily evenly spaced.
Again for categorical variables, is there a natural place to divide the categories up? For example, in education, it might be helpful to control for the fact that students in college might be asked to complete this test as an assignment. Thus, we might make an indicator for individuals in college vs. not. This decision can be informed by our plot, but it should not be driven by our plot!!
For each variable outside of your research question, create the appropriate plot to visualize the relationship between IAT score and the variable. Comment if there is an obvious trend or not.
2.3.2 Multivariate exploration
Now we want to extend our plots where we looked at the outcome (IAT score) and our main variable of interest (as identified by our research question). Here, we will run the same plot, but include another variable. This will help us visualize potential confounders or effect modifiers (to be covered). Note that if you made indicator variables (for race, gender identity or any other variable), then you should have a plot for each indicator variable.
You will need to really think about what kind of plot will best display these relationships! IAT score is continuous, and many of your variable of interest is categorical. You may consider side-by-side boxplots where the color is the additional variable. You might also consider a jitter plot or only plotting the means. Remember you’re goal for plotting is to get a sense of the relationship only from the plot! Your audience should not have to work hard to understand what the plot is communicating. For example, I wanted to look at IAT score, internalization of societal standards, and race. I might make my plot like such:
Code to contruct multivariate plot
iat_2024_comp_factor %>%
ggplot(aes(x = important_f,
y= iat_score,
color = pol_id_f)) +
# geom_jitter(size = 2, alpha = .6, width = 0.2) +
stat_summary(fun = mean, geom = "point", size = 3, shape = 18) +
stat_summary(fun = mean, geom = "line", aes(group = pol_id_f)) +
scale_x_discrete(limits = levels(iat_2024_comp_factor$important_f), labels = function(x) str_wrap(x, width=10)) +
labs(x = "Importance of weight to sense of self \n (Internalization of societal standards)",
y = "Mean IAT score",
title = "Mean IAT scores for importance of weight to sense of self by political identity",
color = "Political identity")Note that the above plot is specific for these variables!! Other variables may require a different type of visualization!! Also note that I originally had geom_jitter() in my plot, which would make the plot really hard to understand!! Try uncommenting it to see what I mean by “hard to understand.” Also, think about why I connected the mean IAT scores across the different levels of internalization. I had a hard time connecting specific race’s points to identify a trend. Again, try commenting out stat_summary(fun = mean, geom="line") to see what I mean.
Also note that I might decide to collapse political identity into three categories: conservative, neutral, and liberal. This would make the plot easier to understand, and it would also be motivated by the fact that there are not many individuals in some of the political identity categories. However, this decision should not be solely driven by the plot! It should be driven by research-backed evidence that these categories can be collapsed together.
Please make sure that you have made the needed changes to your plot in Lab 2. I noticed many unordered groups in plots where there should be an inherent order and unreadable axes because the text was not tilted. Please see discussion on Slack for what some students did to achieve these plots.
Here are a few sources that might help you get started with the visualizations:
Create the appropriate plot to visualize the relationship between IAT score, your main variable (in research question), and the variable identified in your subquestion. Comment whether you can determine anything from the plot or not. If you can, is there any indication that the variable is a confounder or effect modifier?
2.4 Fit a multiple linear regression model
We will fit a temporary multiple linear regression model so that we can set up some of the code and interpretations that we will need for our final poster.
Fit a model with you variable of interest (the one in the research question), the variable in your research subquestion, and 1 other covariate. This will not necessarily be your final model, but we can construct interpretations and code that will be useful in your final model.
2.5 Interpret the coefficient estimate(s) for your main covariate
We can interpret the coefficient estimates for the main covariate and use it as a template for our final project poster. You can build off the text that you wrote in Lab 2 for the interpretation of the coefficient estimate for your main variable.
This is not required, but will be very helpful later: You can use inline code to extract the coefficient estimate and confidence intervals for your main variable. Here is an example of how to do this:
model_temp <- iat_2024_comp_factor %>%
lm(formula = iat_score ~ important_f + pol_id_f)
model_temp_tidy <- tidy(model_temp, conf.int = T)Then you can use inline code to extract the estimate and confidence intervals like such: `r round(model_temp_tidy$estimate[2], 2)` (for the estimate of important_001) and `r round(model_temp_tidy$conf.low[2], 2)` and `r round(model_temp_tidy$conf.high[2], 2)` (for the confidence intervals of important_001).
Interpret the estimated coefficents for the covariate from your research question. Make sure to include the 95% confidence intervals and what other variables you adjusted for.