Bulletin

Due Friday:
- homework 04
- quiz 05
Other:
- Lab hopefully short enough to finish in section

Today

By the end of today you will

finish bootstrapping confidence intervals practice from last time (AE15)
practice the three generate options: permute, draw, and bootstrap

library(tidyverse)
library(tidymodels)

Data

push_pull = read_csv("data/push_pull.csv")

push_pull %>%
  slice(1:3, 24:26)

ABCDEFGHIJ0123456789

participant_id <dbl>	age <dbl>	push1 <dbl>	push2 <dbl>	pull1 <dbl>	pull2 <dbl>	training <chr>
1	41	41	45	16	17	density
2	32	35	44	9	11	density
3	44	33	38	10	11	density
24	36	31	60	9	15	gtg
25	50	35	42	9	12	gtg
26	34	23	39	9	13	gtg

The push_pull dataset comes from a “mini study” by mountain tactical institute.

26 individuals completed 1 of 2 exercise regiments for 3.5 weeks to increase their pushups and pullups. Codebook below:

participant_id: unique identifier for each participant
age: age of participant
push1/2: push-ups at beginning and end of program respectively
pull1/2: pull-ups at beginning and end of program respectively
training: which training protocol the individual participated in

push_pull = push_pull %>%
  mutate(
    pct_push_inc = (push2 / push1 ) - 1,
    pct_pull_inc = (pull2 / pull1) - 1)

The three generate functions

1. `permute`

description: permutes variables
typical case: is there a difference in the outcome between groups?
associated null: group membership does not matter i.e. group A and group B have the same outcome
Example (test for independence):

Two exercise regimes:

“density” training
“grease-the-groove” (gtg)

We want to know, is the average pull-up percent increase of a gtg trainee significantly greater than a density trainee?

Fundamentally, does the categorical variable training affect the average percentage increase in pull-ups?

State the null hypothesis

What we want to do to simulate data under this null:

random_training = sample(push_pull$training, replace = FALSE)

push_pull %>%
  select(pct_pull_inc) %>%
  mutate(random_training = random_training)

Exercise 1:

Complete the hypothesis specification above by stating the alternative.
Perform a hypothesis test, compute the p-value and state your conclusion with .

set.seed(1)
# code here

2. `draw`

description: flip a coin with probability p, or roll a die with probabilities associated with each side. See here for reference to the multinomial setting.
typical case: test the proportion of a binary outcome
null: proportion is some fixed number
Example (hypothesis test for a single proportion):

“Most people who train consistently will see at least a 15% increase in push-ups over an 3.5 week training period.”

Breaking it down:

“Most” i.e. “greater than 50%”. This indicates we should examine a proportion.

What’s the null?

“will see at least a 15% increase”. Each person either increases by 15% over 3.5 weeks or does not. This is our binary outcome.

push_up = push_pull %>%
  select(pct_push_inc) %>%
  mutate(over15pct =
           pct_push_inc > 0.15)

push_up

Our “NULL” means that the TRUE and FALSE in the over15pct column are equally likely.

Exercise 2

Complete the hypothesis specification above by stating the alternative.
Perform a hypothesis test, compute the p-value and state your conclusion with .

set.seed(2)
# code here

3. `bootstrap`

description: re-sample your data with replacement
typical case (in hypothesis testing): does the mean equal a specific value? Does the median equal a specific value?
null: what would the data have looked like if nothing but the point estimate changed?
Example:

“The mean age of push-up/pull-up training partcipants is greater than 30”.

What’s the null?

Bootstrapping does the following…

# find observed statistic
obs_mean_age = push_pull %>%
  drop_na(age) %>%
  summarize(meanAge = mean(age)) %>%
  pull()

# subtract observed_mean - desired_mean from age
age_and_null = push_pull %>% 
  select(age) %>%
  drop_na(age) %>%
  mutate(nullAge = age - (obs_mean_age - 30))

# show data frame
age_and_null

# show the means of each column
age_and_null %>%
  summarize(meanAge = mean(age),
            mean_nullAge = mean(nullAge))

If we take bootstrap samples from this new nullAge column, we are sampling from data with the same variability as our original data, but a different mean. This is a nice way to explore the null!

Exercise 3

Complete the hypothesis specification above by stating the alternative.
Perform a hypothesis test, compute the p-value and state your conclusion with .

set.seed(3)
# code here

Review: using generate with tidymodels

03.02.2022

Bulletin

Today

Data

The three generate functions

1. `permute`

Exercise 1:

2. `draw`

Exercise 2

3. `bootstrap`

Exercise 3

Review: using generate with tidymodels

03.02.2022

Bulletin

Today

Data

The three generate functions

1. permute

Exercise 1:

2. draw

Exercise 2

3. bootstrap

Exercise 3

1. `permute`

2. `draw`

3. `bootstrap`