Lab #07: Central Limit Theorem Intro

Learning Goals

In this lab you will…

Getting started

Packages

We will use the tidyverse and tidymodels packages in this lab.

library(tidyverse)
library(tidymodels)

Data

Today’s data is a subset of the PanTHERIA dataset1 Jones, Kate E., et al. “PanTHERIA: a species‐level database of life history, ecology, and geography of extant and recently extinct mammals: Ecological Archives E090‐184.” Ecology 90.9 (2009): 2648-2648. on mammalian life history traits.

pantheria = read_csv("data/pantheria_subset.csv")

Exercises

Instructions

Soricidae aka 'shrews'. Image from [wikipedia](https://en.wikipedia.org/wiki/Shrew) Soricidae aka ‘shrews’. Image from wikipedia

Exercise 1

The yellow bat, an example of Vespertilionidae, aka 'microbats'. Image from [wikipedia](https://en.wikipedia.org/wiki/Scotophilus) The yellow bat, an example of Vespertilionidae, aka ‘microbats’. Image from wikipedia

To begin, let’s clean the data. Values of -999 should in fact be NA. To convert these to NA, use the code chunk below as a template, replacing the question mark with the appropriate value.

pantheria[pantheria == ?] = NA

Exercise 2

Exercise 3

Ex 3 Hint: we only observe each species in a family once. You should search in your favorite browser of choice: “how many species in vespertilionidae family?” and “how many species in soricidae family?”)

The goal of this analysis is to use CLT-based inference to understand the distribution of body mass. The idea is that if CLT holds, we can assume the distribution of the sample mean is normal and thus easily generate a normal null distribution to test hypotheses.

Before we use CLT, let’s check to see if the necessary criteria are satisfied. For each condition, indicate whether it is satisfied and provide a brief explanation supporting your response. Be sure to check for both families of interest.

Exercise 4

Is the mean adult body mass (abm) of Soricidae significantly greater than 10 g?

State the null and alternative hypothesis. Write your hypotheses in words and mathematical notation.

Exercise 5

Ex 5 Hint: Use \sim to create the mathematical tilde. This statements reads: “x bar is normally distributed”

Let \(\bar{x}_s\) be the sample mean of Soricidae.

Given the Central Limit Theorem and the hypotheses from the previous exercise,

Exercise 6

Compute the p-value associated with our observed statistic (sample mean).

Ex 6 Hint: pnorm finds a left-tailed probability by default, and we are interested in a right-tailed probability.

Exercise 7

Let’s compute the p-value in a slightly different way.

To begin, use R as a calculator to compute a standardized score called a “Z-score”. Save this quantity as Z. The formula to compute Z is below:

\[ Z = \frac{\bar{x} - \mu_0}{SE} \] Here, \(\bar{x}\) is the sample mean, \(\mu_0\) is the mean under the null and \(SE\) is the standard error.

Exercise 8

\[ T = \frac{\bar{x} - \mu_0}{SE_\hat{\sigma}} \]

Where \(SE_{\hat{\sigma}}\) denotes the standard error computed with our observed standard error based on \(\hat{\sigma}\).

How does the T statistic compare to the Z score? Why?

Submission

There should only be one submission per team on Gradescope.

Grading

Component Points
Ex 1 2
Ex 2 10
Ex 3 6
Ex 4 4
Ex 5 5
Ex 6 6
Ex 7 6
Ex 8 5
Workflow & formatting 6