library(tidyverse)
library(tidymodels)

Learning goals

Review

Suppose the distribution of the number of minutes users engage with apps on an iPad has a mean of 8.2 minutes and standard deviation of 1 minute. Let x be the number of minutes users engage with apps on an iPad, μ be the population mean and σ the population standard deviation. Then,

xN(8.2,1)

Suppose you take a sample of 60 randomly selected app users and calculate the mean number of minutes they engage with apps on an iPad, ˉx. The conditions (independence & sample size/distribution) to apply the Central Limit Theorem are met. Then by the Central Limit Theorem

ˉxN(8.2,1/60)

    #add code

Data: Pokemon

We will be using the pokemon dataset, which contains information about 42 randomly selected Pokemon (from all generations). You may load in the dataset with the following code:

pokemon <- read_csv("data/pokemon.csv")

In this analysis, we will use CLT-based inference to draw conclusions about the mean height among all Pokemon species.

Exercise 1

Let’s start by looking at the distribution of height_m, the typical height in meters for a Pokemon species, using a visualization and summary statistics.

ggplot(data = pokemon, aes(x = height_m)) +
  geom_histogram(binwidth = 0.25, fill = "steelblue", color = "black") + 
  labs(x = "Height (in meters)", 
       y = "Distributon of Pokemon heights")

pokemon %>%
  summarise(mean_height = mean(height_m), 
            sd_height = sd(height_m), 
            n_pokemon = n())
ABCDEFGHIJ0123456789
mean_height
<dbl>
sd_height
<dbl>
n_pokemon
<int>
0.92857140.497449942

In the previous lecture (and in the review questions), we were given the mean, μ, and standard deviation, σ, of the population. That is unrealistic in practice (if we knew μ and σ, we wouldn’t need to do statistical inference!).

Today we will use our sample data and the Central Limit Theorem to draw conclusions about the μ, the mean height in the population of Pokemon.

  • What is the point estimate for μ, i.e., the “best guess” for the mean height of all Pokemon?

  • What is the point estimate for σ, i.e., the “best guess” for the standard deviation of the distribution of Pokemon heights?

Exercise 2

Before moving forward, let’s check the conditions required to apply the Central Limit Theorem. Are the following conditions met:

  • Independence?
  • Sample size/distribution?

Exercise 3

By the Central Limit Theorem,

ˉxN(μ,σ/n)Z=ˉxμσ/n

where Z is a standardized score such that ZN(0,1).

  • Describe the distribution of Z in words.

In practice, we can’t calculate the standardized score Z, so instead we will use the standardized score T when conducting inference for a population mean…

Z=ˉxμσ/nT=ˉxμ0s/n

  • How do Z and T differ?

  • What is the estimated standard error s/n for the Pokemon data?

# add code

T is a new standardized score that follows a t distribution with n1 degrees of freedom. It is written as tn1. We will use the tn1 distribution to help us conduct hypothesis tests and construct confidence intervals.

Exercise 4

The mean height of humans is about 1.65 meters. We would like to test whether the mean height of Pokemon is less than the mean height of humans.

  • State the null and alternative hypotheses in words and statistical notation.

  • Calculate the T test statistic.

T=ˉxμ0s/n

where μ0 is the null hypothesized value.

# add code
  • What is the distribution of the test statistic, T?

  • Now let’s calculate the p-value. Fill in the code below to use the pt() function to calculate the p-value. For x input the value of the test statistic, and for df input the degrees of freedom.

#pt(x = ____, df = ____)
  • State with the p-value means.

  • State the conclusion in the context of the data using a significance level of α=0.05.

Exercise 5

We would like to construct a 90% confidence interval for the mean height of Pokemon species. The equation general equation for a confidence interval is

estimate±crit×SE

Specifically, the confidence interval for the mean is

ˉx±tn1×sn

The second part of the equation, tn1×sn is called the margin of error.

We already know ˉx and s/n, so let’s talk about tn1. This value is determined based on the confidence level, C. It is the point on the t distribution with n1 degrees of freedom, such that the area between t and t is C.

  • What is the critical value t for our 90% confidence interval of the mean Pokemon height?
## add code
  • Now calculate the 90% confidence interval for the mean Pokemon height.
# add code
  • Interpret the interval in the context of the data.

CLT-based calculations in infer

Hypothesis test

  • Conduct the hypothesis test from Exercise 4 using the t_test() function.
pokemon %>%
  t_test(response = height_m, 
         alternative = "less", 
         mu = 1.65, 
         conf_int = FALSE)
ABCDEFGHIJ0123456789
statistic
<dbl>
t_df
<dbl>
p_value
<dbl>
alternative
<chr>
estimate
<dbl>
-9.398718414.38446e-12less0.9285714

Confidence interval

  • Calculate the 95% confidence interval from Exercise 5 using the t_test() function.
pokemon %>%
  t_test(response = height_m, 
         conf_int = TRUE, 
         conf_level = 0.9) %>%
  select(lower_ci, upper_ci)
ABCDEFGHIJ0123456789
lower_ci
<dbl>
upper_ci
<dbl>
0.79939681.057746