Bulletin

Today

By the end of today you will

library(tidyverse)
library(tidymodels)

Data

push_pull = read_csv("data/push_pull.csv")
push_pull %>%
  slice(1:3, 24:26)

The push_pull dataset comes from a “mini study” by mountain tactical institute.

26 individuals completed 1 of 2 exercise regiments for 3.5 weeks to increase their pushups and pullups. Codebook below:

push_pull = push_pull %>%
  mutate(
    pct_push_inc = (push2 / push1 ) - 1,
    pct_pull_inc = (pull2 / pull1) - 1)

The three generate functions

1. permute

  • description: permutes variables
  • typical case: is there a difference in the outcome between groups?
  • associated null: group membership does not matter i.e. group A and group B have the same outcome
  • Example (test for independence):

Two exercise regimes:

  • “density” training
  • “grease-the-groove” (gtg)

We want to know, is the average pull-up percent increase of a gtg trainee significantly greater than a density trainee?

Fundamentally, does the categorical variable training affect the average percentage increase in pull-ups?

State the null hypothesis

What we want to do to simulate data under this null:

random_training = sample(push_pull$training, replace = FALSE)

push_pull %>%
  select(pct_pull_inc) %>%
  mutate(random_training = random_training)

Exercise 1:

  • Complete the hypothesis specification above by stating the alternative.
  • Perform a hypothesis test, compute the p-value and state your conclusion with \(\alpha = 0.05\).
set.seed(1)
# code here

2. draw

  • description: flip a coin with probability p, or roll a die with probabilities associated with each side. See here for reference to the multinomial setting.
  • typical case: test the proportion of a binary outcome
  • null: proportion \(p\) is some fixed number
  • Example (hypothesis test for a single proportion):

“Most people who train consistently will see at least a 15% increase in push-ups over an 3.5 week training period.”

Breaking it down:

  • “Most” i.e. “greater than 50%”. This indicates we should examine a proportion.

What’s the null?

  • “will see at least a 15% increase”. Each person either increases by 15% over 3.5 weeks or does not. This is our binary outcome.
push_up = push_pull %>%
  select(pct_push_inc) %>%
  mutate(over15pct =
           pct_push_inc > 0.15)

push_up
  • Our “NULL” means that the TRUE and FALSE in the over15pct column are equally likely.

Exercise 2

  • Complete the hypothesis specification above by stating the alternative.
  • Perform a hypothesis test, compute the p-value and state your conclusion with \(\alpha = 0.01\).
set.seed(2)
# code here

3. bootstrap

  • description: re-sample your data with replacement
  • typical case (in hypothesis testing): does the mean equal a specific value? Does the median equal a specific value?
  • null: what would the data have looked like if nothing but the point estimate changed?
  • Example:

“The mean age of push-up/pull-up training partcipants is greater than 30”.

What’s the null?

Bootstrapping does the following…

# find observed statistic
obs_mean_age = push_pull %>%
  drop_na(age) %>%
  summarize(meanAge = mean(age)) %>%
  pull()

# subtract observed_mean - desired_mean from age
age_and_null = push_pull %>% 
  select(age) %>%
  drop_na(age) %>%
  mutate(nullAge = age - (obs_mean_age - 30))

# show data frame
age_and_null

# show the means of each column
age_and_null %>%
  summarize(meanAge = mean(age),
            mean_nullAge = mean(nullAge))

If we take bootstrap samples from this new nullAge column, we are sampling from data with the same variability as our original data, but a different mean. This is a nice way to explore the null!

Exercise 3

  • Complete the hypothesis specification above by stating the alternative.
  • Perform a hypothesis test, compute the p-value and state your conclusion with \(\alpha = 0.05\).
set.seed(3)
# code here