Project Report Instructions
BSTA 513/613
1 Directions
You may use this project template to get started on the report. It is your responsibility to meet the formatting guidelines below!!
DO NOT USE SITE PAGE (“Project Report Instructions”, current page) as your template!!
1.1 Purpose
Project reports serve as a great way to communicate the knowledge learned in our class and connect it to context within research. It is important that we can take a step back from the numbers and analysis to see what questions linear regression can help us answer.
It is really important for you to look back through your labs and connect the work!!
1.2 Formatting guide
- The report will be written in Quarto. Turn in both the
qmd
andhtml
files- No code should appear in the
html
document- This means all R code chunks should have
#| echo: false
- This also means warnings and messages should be turned off
- This means all R code chunks should have
- No code should appear in the
- The report should be 10 - 14 paragraphs long
- In 512, many lab reports were a little too long!
- Remember, I know mostly what you did in the labs! This report is meant to synthesize that work into a coherent message/story!
- This often means details of our analysis are lost.
- Tables and figures should NOT have variable names as they appear in the data frame
- Variable names should be understood by a reader
- Variable names should be written in full words
- Include a title or caption for all figures
- Figure and tables appear on same page or close to same page where they are first referenced
- Tables and figures are an appropriate size in the html - Nicky is able to read all words in figures and tables
- Writing, spelling, and grammar should be admissable
- This means I can generally follow your thought/what you are trying to communicate
- Some spelling and grammar mistakes are allowed
- I will not take off points if there are a few sprinkled in
- If every or close to every sentence has mistakes, then I will take off
- Sectioning of the report
- Main sections that are required: Introduction, Statistical Methods, Results, Discussion, Conclusion, Reflection, and References
- You may have an appendix to include additional figures!
- Other sections that might help group specific methods or results
- Main sections that are required: Introduction, Statistical Methods, Results, Discussion, Conclusion, Reflection, and References
- Title information at the top of the
html
- This includes the title itself, your name, and the date
You can save tables and figures from labs or separate files, then load them in the report
- Save R objects in analyses file:
- Suppose you named the Table 1 as
table1
save(table1, file = "table1.Rdata")
- Suppose you named the Table 1 as
- Load R objects in report file:
load(file = "table1.Rdata")
1.3 Examples of reports
The following are examples of reports from BSTA 513 with the feedback that I gave them.
Please note that 513 uses a different type of outcome than our class. These examples are meant to help guide you with the formatting and some appropriate content.
Also note that these were converted to PDFs so I could write in feedback. Some of the tables and figure sizes were distorted. They need to be legible in the html
.
The above reports have code showing in their html
. Remember that I am asking you to hide all code, warnings, and messages.
1.4 Grading
The project report is out of 36 points. Note that the Statistical Methods and Results sections are graded on an 8-point scale, while all other components are graded on a 4-point scale.
In the final lab, I gave you the option to do LASSO regression and focus on prediction. I know this created some confusion since we mostly set up the project as a question of association in Lab 1-3. We can still interpret the odds ratios from LASSO regression. I will be fairly lenient if reports are confused between prediction and association aims. I will try to give feedback on it, but I will not penalize any minor confusions.
Another word: My process starts harsh. I want to give you as much feedback as possible, and this will also reflect in lower assigned scores. At this point, I put the report grades into Sakai. I check to see if anyone’s overall course letter grade goes down. If less than ~5 course grades go down, then I revisit their project reports. If their report fails to demonstrate the most important learning objectives from the course, then I will keep the lower grade. If they demonstrate an understanding of the most important learning objectives, then I will adjust their score to increase their course grade. If more than 5 grades go down, then I revisit everyone’s reports. I will make a class wide grade bump in all reports.
4 points | 3 points | 2 points | 1 point | 0 points | |
---|---|---|---|---|---|
Formatting | Lab submitted on Sakai (or by email if late) with .html file. Report is written in complete sentences with very few grammatical or spelling errors. With little editing, the report can be distributed. |
Lab submitted on Sakai (or by email if late) with .html file. Report is written in complete sentences with some (around 2 per section) grammatical or spelling errors. With some editing, the report can be distributed. |
Lab submitted on Sakai (or by email if late) with .html file. Report is written in complete sentences, but have many grammatical or spelling errors. With major editing, the report can be distributed. |
Lab submitted on Sakai (or by email if late) with .html file. Report is written in complete sentences, but are very hard to follow due to grammar mistakes. |
Lab not submitted on Sakai (or by email if late) with .html file. Report is not written with complete sentences. With major editing, the report can be distributed. |
Figures and work | All requested output is displayed, including 2 required figures and tables, and at least one additional figure. Figures and tables look professional, are easily interpreted by the reader, and easily convey the intended message. | All requested output is displayed, including 2 required figures and tables, and at least one additional figure. For the most part, figures and tables look professional, are easily interpreted by the reader, and easily convey the intended message. A few mistakes in the figures are made. | All requested output is displayed, including 2 required figures and tables, and at least one additional figure. Figures and tables look semi-professional, are not so easily interpreted by the reader, and convey the intended message but after some work by the reader. Some mistakes in the figures are made. | All requested output is displayed, including 2 required figures and tables, and at least one additional figure. Figures and tables do not look professional, are not easily interpreted by the reader, and/or do not convey the intended message. Many mistakes in the figures are made. | Requested output is not displayed, Missing one or more figures. |
Introduction | Provides a good background for the research question, includes motivation for the question, and references previous research that justifies this analysis. | Provides a decent background for the research question and includes motivation for the question. Previous research is mentioned, but feels disconnected to the current analysis. | Provides a decent background for the research question and includes motivation for the question. Previous research is mentioned, but feels disconnected to the current analysis. | Does not provide a background that connects to the research question. Motivation and previous research are not mentioned. | No introduction included. |
Methods (8 points) | Describes statistical methods concisely and highlights pertinent information to the reader (listed Sections below). Demonstrates proper analyses were performed. | Describes statistical methods and highlights pertinent information to the reader (listed Sections below). Details were omitted or added that were not needed to explain the overarching methods. Demonstrates proper analyses were performed. | Describes statistical methods and highlights pertinent information to the reader (listed Sections below). Details were omitted or added that were not needed to explain the overarching methods. Some incorrect analyses included in the description. | Describes statistical methods, but lacks clarity. Demonstrates a lack of understanding about the overall process of regression analysis. Incorrect analyses included in the description. | No methods included. |
Results (8 points) | Correctly interprets coefficients for the explanatory variable and identifies any other interesting trends. Highlights pertinent results to the reader (listed Sections below). | Correctly interprets coefficients, but does correctly incorporate the interaction (if in the model). Highlights pertinent results to the reader (listed Sections below). | Incorrectly interprets coefficients. Highlights pertinent results to the reader (listed Sections below). | Incorrectly interprets coefficients.Omits pertinent results to the reader (listed Sections below). | No results included. |
Discussion | Thoroughly and concisely discusses limitations and considerations of the results, and their consequences. | Discusses limitations and considerations of the results and their consequences, but misses some big considerations. | Discusses limitations and considerations of the results, but does not discuss the consequences. | Discusses limitations and considerations of the results, but misses many considerations and does not discuss consequences. | No discussion included. |
Conclusion and References | For the conclusion, main research question is answered and statistical caveats described to non-technical person. References are mostly cited consistently within the report, and in the Reference section. This includes the data source! | For the conclusion, main research question is answered and statistical caveats described to non-technical person. References are sometimes cited consistently within the report, and in the Reference section. This includes the data source! | For the conclusion, main research question is somewhat answered (but focus is not on the research question) and statistical caveats described to non-technical person. References are sometimes cited consistently within the report, and in the Reference section. This includes the data source! | For the conclusion, main research question is somewhat answered (but not the focus at all) and statistical caveats are not described. References are not cited consistently within the report, and in the Reference section. This includes the data source! | For the conclusion, main research question is not answered. Or references are not included at all. |
Reflection | Discusses all the labs. Reflection demonstrates an understanding of each lab’s purpose and overarching connection from the labs to overall project. | Discusses 3 out of 4 labs. | Discusses 2 out of 4 labs. | Discusses 1 out of 4 labs. | No reflection included |
In formatting, an example of a report with little editing needed is one that has zero to some grammar or spelling mistakes, no code chunks showing, and no output warnings nor messages showing.
Professional figures mean
I can read the words and numbers in the html
Variable names are converted from the data frame version to readable text
For example:
iam_001
does not show up on axes, instead something like:Response to "Currently, I am..."
Colors are only used if conveying information
Intended message of the figure is easily understood
- If you are trying to show a trend of proportion of food insecurity vs. an ordered categorical variable, then the variable is ordered on the x-axis
For the references
I will not be overly critical about the formatting
By consistency, I mean that you if you are citing things like (Last Name, Year) it doesn’t suddenly change to number citations.
If you would like to use Quarto’s citation tool, you can! I actually pair it with Zotero and it works beautifully! (But I would not embark on this if you haven’t used Zotero before)
1.5 Guide on figures
- The only figure that I require in the report is Table 1
- If you are focusing on prediction (LASSO) than you may not need to present your odds ratios at all.
- I think it will be good to do this since LASSO does allow for interpretations and we framed our project as such
- If you have a lot of predictors (i.e. you included categorical income or family size), please do not put a giant table of forest plot in the middle of your results
- Even if you did not include these variables, you can keep the full table/plot in the appendix
- Put this big table or plot in the appendix!
- When it comes to reporting the odds ratios: you can simply include the estimated odds ratio in text (as an interpretation)
- This should ONLY be for your variable of interest!
- If your variable has multiple categories, discuss the trend and then report an example interpretation to demonstrate how each category should be interpreted
- You may wish to include a table showing all the odds ratios for each group
- Another important table may include model assessment values like AUC if you are comparing models
- If you only have one model, you may not need to make a table
- When deciding to create a table or figure, ask yourself: Does this help the reader understand what I am saying in text? Can it be conveyed just as well with words?
- Things that are conveyed better with tables/figures
- Comparisons of the same measurement across different models
- Multiple calculated values for the same measurement
- Like estimated odds ratios for all variables in a model
- Things that are conveyed better with tables/figures
2 Sections
2.1 Title
- Purpose: Create an identifiable name for your research project that includes the main research question’s variables and gives some context to the analysis or results
2.2 Introduction
- Length: 1-2 paragraphs
- Purpose: Introduce the project motivation, data, and research question. It also includes any background information relevant for understanding the analysis and relevant previous work.
- This section is non-technical.
- By reading just the introduction, someone without a technical background should have an idea of what they study was about, and why it is important
- You may start with the introduction written in Lab 1, but you should edit it and make sure it flows into your report well!
- Should contain some references
- Should include a sentence that states your research question (but NOT using a question). For example: “This study investigates the association between food insecurity and age.”
2.3 Statistical Methods
- Length: 3-5 paragraphs
- Purpose: Describe the analyses that were conducted and methods used to select variables and check diagnostics
- Important to keep in mind: methods typically describe your approach and process, not the results of that process
- For example: I might say “We investigated the linearity of each continuous covariate visually. If continuous variables were not linear, then we divided the variable into categories using existing guidelines from <insert reference here> or creating quartiles.”
- In the methods section, I would NOT say: “We investigated the linearity of each continuous covariate visually. We found that age was not linearly related to log-odds of food insecurity. Thus, we categorized age into the following groups: ___, ____, ____, ____, and ____.”
- The last two sentences about age would be more appropriate in the Results section
- In the methods section, I would NOT say: “We investigated the linearity of each continuous covariate visually. We found that age was not linearly related to log-odds of food insecurity. Thus, we categorized age into the following groups: ___, ____, ____, ____, and ____.”
- For example: I might say “We investigated the linearity of each continuous covariate visually. If continuous variables were not linear, then we divided the variable into categories using existing guidelines from <insert reference here> or creating quartiles.”
- Some important methods to discuss (You may divide these into your sections, not necessarily with these names)
- General approach to the dataset
- 3-5 sentences
- Did you need to do any quality control?
- Missing data: complete case analysis or imputation based on Lab 3 choices
- 1 sentence
- Can be included in the Exploratory data analysis section
- Variable transformations
- This includes a description of analyses for Table 1 and what statistics were used to summarize the variables
- More on creation of Table 1, not discussing the results of Table 1
- Includes (not required)
- Categorizing a continuous variable (even if performed in model selection)
- Did you use family size/household size/number of children as categorical or continuous?
- Did you make any transformations to income?
- Using scoring for an ordered categorical variable (that is not your explanatory variable)
- 1 sentence per changed variable
- Making a categorical variable a factor in our R code is NOT a change!
- This includes a description of analyses for Table 1 and what statistics were used to summarize the variables
- Model building: we performed purposeful selection OR LASSO regression
- 3 sentences max
- This is where we discuss our process from Lab 4
- For purposeful selection, includes:
- Describe purposeful selection: combining existing literature, clinical significance, and analysis
- How did you build the model? Describe the process in ONE SENTENCE
- Did you consider confounders and effect modifiers?
- For LASSO:
- Mention that you used LASSO regression with important facts like the penalty used, percent testing set, included interactions in the selection process
- Model diagnostics and model fit
- 2-5 sentences
- This is where we discuss our process from Lab 4
- Includes
- Process of investigating model diagnostics and fit
- If assumptions were not met, what process did you use to fix it?
- Again, we will NOT discuss the results of the model diagnostics
- For example: “We investigate the change in Pearson residual for every observation. For observations with large changes (exceeding change of 4), we check the feasibility of their measurements.
- General approach to the dataset
2.4 Results
- Length: ~2-3 paragraphs
- Purpose: Relay the results from our sample’s analysis typically focusing on the numbers and interpretations
- The goal is not to interpret every single variable in the model but rather to show that you are proficient in using the model output to address the research question, using the interpretations to support your conclusions.
- Focus on the variables that help you answer the research question and that provide relevant context for the reader.
- Some important results to discuss (also could be sections)
- Sample data set statistics (Table 1)
- 3-5 sentences
- Include a brief description of the sample’s characteristics
- Table 1 should be referenced and appear here!
- Final model
- 1-2 sentences
- Describe final model (or models if comparing a few)
- What variables were included in your final model?
- What interactions with your explanatory variable did you include?
- Interpret the model coefficients in the context of the research question
- 1 paragraph (maybe 2 in special cases)
- When in doubt, ask Nicky if your analysis is a special case
- Interpreting the explanatory variable’s relationship with food insecurity is the most important thing to report!!
- When doing this, make sure you account for ALL interactions: If your explanatory variable has multiple interactions and you are trying to interpret one, then what does that mean about the other variables involved in the other interactions? If this is confusing, please make an appointment with me!!
- 1 paragraph (maybe 2 in special cases)
- Results of model diagnostics and model fit if there is anything worth noting
- Sample data set statistics (Table 1)
- Tables & figures
- The following are required tables or figures
- Table 1 summarizing participant characteristics both overall and stratified by your primary independent variable
- Other possible tables or figures
- Table or figure with regression results
- Can be a forest plot
- If you have A LOT of coefficient estimates, the forest plot may not work well!
- Table or figure with regression results
- 1-3 figures that you think are helpful in understanding the results, for example
- DAG explaining connection between variables (if you did this)
- Table or figure to compare model fit statistics (if you did this)
- Table or figure for unadjusted relationship between outcome and explanatory variables
- The following are required tables or figures
2.5 Discussion
- Length: 2-3 paragraphs
- Purpose: Discuss the results and give them context outside of the sample and its analysis
- Some important things to include
- Include a paragraph on the limitations of the results
- You don’t need to hit all the limitations, but think about the big ones (generalizability? independence of samples? large sample size vs. clinical significance? the way we handled variables?)
- After limitations, discuss the positive parts of the results
- What can we do with these results? What impact can it have?
- Any overarching trends that are worth noting?
- Include a paragraph on the limitations of the results
- Should contain some references
2.6 Reflection
- How did each Lab help you build towards your project report?
- Length: 1-2 sentences per lab
- Did you change anything from your labs?
- Length: 1-5 sentences per lab
- 1 sentence to say no changes were made or up to 5 sentences describing the changes you made
- For example, if I gave you feedback about changing a variable to a factor, you do not need to discuss that here.
- However, if you made a serious change to how you built your model from Lab 4 (after seeing my feedback) then put that here.
- Length: 1-5 sentences per lab
2.7 References
- Include your references here!
- You introduction should have references, especially when discussing the social science behind the analysis
- You must reference the WBNS data source!!
2.8 Optional: Appendix
- Additional figures, tables, and plots that you may want to include