Due Friday:
Other:
By the end of today you will
generate
options: permute
, draw
, and bootstrap
library(tidyverse)
library(tidymodels)
push_pull = read_csv("data/push_pull.csv")
push_pull %>%
slice(1:3, 24:26)
The push_pull
dataset comes from a “mini study” by mountain tactical institute.
26 individuals completed 1 of 2 exercise regiments for 3.5 weeks to increase their pushups and pullups. Codebook below:
participant_id
: unique identifier for each participantage
: age of participantpush1/2
: push-ups at beginning and end of program respectivelypull1/2
: pull-ups at beginning and end of program respectivelytraining
: which training protocol the individual participated inpush_pull = push_pull %>%
mutate(
pct_push_inc = (push2 / push1 ) - 1,
pct_pull_inc = (pull2 / pull1) - 1)
permute
Two exercise regimes:
We want to know, is the average pull-up percent increase of a gtg trainee significantly greater than a density trainee?
Fundamentally, does the categorical variable training
affect the average percentage increase in pull-ups?
State the null hypothesis
What we want to do to simulate data under this null:
random_training = sample(push_pull$training, replace = FALSE)
push_pull %>%
select(pct_pull_inc) %>%
mutate(random_training = random_training)
set.seed(1)
# code here
draw
“Most people who train consistently will see at least a 15% increase in push-ups over an 3.5 week training period.”
Breaking it down:
What’s the null?
push_up = push_pull %>%
select(pct_push_inc) %>%
mutate(over15pct =
pct_push_inc > 0.15)
push_up
over15pct
column are equally likely.set.seed(2)
# code here
bootstrap
“The mean age of push-up/pull-up training partcipants is greater than 30”.
What’s the null?
Bootstrapping does the following…
# find observed statistic
obs_mean_age = push_pull %>%
drop_na(age) %>%
summarize(meanAge = mean(age)) %>%
pull()
# subtract observed_mean - desired_mean from age
age_and_null = push_pull %>%
select(age) %>%
drop_na(age) %>%
mutate(nullAge = age - (obs_mean_age - 30))
# show data frame
age_and_null
# show the means of each column
age_and_null %>%
summarize(meanAge = mean(age),
mean_nullAge = mean(nullAge))
If we take bootstrap samples from this new nullAge
column, we are sampling from data with the same variability as our original data, but a different mean. This is a nice way to explore the null!
set.seed(3)
# code here