[1] 882.474
TB sections 1.2, 1.4
2024-10-02
In evolutionary biology, parental investment refers to the amount of time, energy, or other resources devoted towards raising offspring.
We will be working with the frog
dataset, which originates from a 2013 study2 about maternal investment in a frog species. Reproduction is a costly process for female frogs, necessitating a trade-off between individual egg size and total number of eggs produced.
Researchers were interested in investigating how maternal investment varies with altitude. They collected measurements on egg clutches found at breeding ponds across 11 study sites; for 5 sites, the body size of individual female frogs was also recorded.
altitude | latitude | egg.size | clutch.size | clutch.volume | body.size | |
---|---|---|---|---|---|---|
1 | 3,462.00 | 34.82 | 1.95 | 181.97 | 177.83 | 3.63 |
2 | 3,462.00 | 34.82 | 1.95 | 269.15 | 257.04 | 3.63 |
3 | 3,462.00 | 34.82 | 1.95 | 158.49 | 151.36 | 3.72 |
150 | 2,597.00 | 34.05 | 2.24 | 537.03 | 776.25 | NA |
altitude | latitude | egg.size | clutch.size | clutch.volume | body.size | |
---|---|---|---|---|---|---|
1 | 3,462.00 | 34.82 | 1.95 | 181.97 | 177.83 | 3.63 |
2 | 3,462.00 | 34.82 | 1.95 | 269.15 | 257.04 | 3.63 |
3 | 3,462.00 | 34.82 | 1.95 | 158.49 | 151.36 | 3.72 |
150 | 2,597.00 | 34.05 | 2.24 | 537.03 | 776.25 | NA |
NA
means the measured value for body size in clutch #150 is missingVariable | Description |
---|---|
altitude |
Altitude of the study site in meters above sea level |
latitude |
Latitude of the study site measured in degrees |
egg.size |
Average diameter of an individual egg to the 0.01 mm |
clutch.size |
Estimated number of eggs in clutch |
clutch.volume |
Volume of egg clutch in mm³ |
body.size |
Length of egg-laying frog in cm |
Numerical variables
Numerical variables take on numerical values, such that numerical operations (sums, differences, etc.) are reasonable.
Discrete: only take on integer values (e.g., # of family members)
Continuous: can take on any value within a specified range (e.g., height)
Categorical variables
Categorical variables take on values that are names or labels; the possible values are called the variable’s levels.
R
R
yet, but I want this to serve as a reference for you!R
a little differentlyR
types to variable typesR type | variable type | description |
---|---|---|
integer | discrete | integer-valued numbers |
double or numeric | continuous | numbers that are decimals |
factor | categorical | categorical variables stored with levels (groups) |
character | categorical | text, “strings” |
logical | categorical | boolean (TRUE, FALSE) |
Let’s start looking into ways to summarize and explore numerical data!
Warning!
I decided to keep some R
code in these slides. It’s going to be a little confusing now, but I thought it would be a worthwhile reference as soon as we get through R basics
Sample mean
the average value of observations
\[\overline{x} = \frac{x_1+x_2+\cdots+x_n}{n} = \sum_{i=1}^{n}\frac{x_i}{n}\]
where \(x_1, x_2, \ldots, x_n\) represent the \(n\) observed values in a sample
Example: What is the mean clutch volume in the frog
dataset?
\[\overline{x} = \sum_{i=1}^{431}\frac{x_i}{431}\]
Answer: the mean clutch volume is 882.5 \(\text{mm}^3\).
Median
The middle value of the observations in a sample
The median is the 50th percentile, meaning
Standard deviation (SD)
(Approximately) the average distance between a typical observation and the mean
Let’s calculate the sample standard deviation for the clutch volume:
\(s = \sqrt{\sum_{i=1}^{n}\frac{(x_i - \overline{x})^2}{n-1}} =\)
R
can easily do this for us!
Answer: The standard deviation of the clutch volume is 379.05 mm3
For symmetric bell-shaped data, about
These percentages are based off of percentages of a true normal distribution.
The \(p^{th}\) percentile is the observation such that \(p\%\) of the remaining observations fall below this observation.
Interquartile range (IQR)
The distance between the third and first quartiles. \[IQR = Q_3 - Q_1\]
5 number summary
Min. 1st Qu. Median Mean 3rd Qu. Max.
151.4 609.6 831.8 882.5 1096.5 2630.3
\[IQR = Q_3 - Q_1 = 1096.5 - 609.6 = 486.9\]
Summary statistics are called robust estimates if extreme observations have little effect on their values
Estimate | Robust? |
---|---|
Sample mean | ❌ |
Median | ✅ |
Standard deviation | ❌ |
IQR | ✅ |
Lesson 2 Slides