By the end of today you will
library(tidyverse)
outcome = c(0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0)
flips = data.frame(outcome)
cat(flips$outcome)
## 0 0 0 0 0 0 0 0 1 0 0 0
Without realizing it, we setup a null hypothesis:
Mathematically we state this null hypothesis in terms of \(p\) (the probability of the coin landing heads)
What should \(p\) equal if the coin is fair?
\(H_0\): \(p = ?\)
This is a hypothesis about “truth”. The “true” parameter \(p\) that belongs to this particular coin when flipped infinitely many times. We don’t know the true \(p\), we make an assumption about it.
All we have is a sample of coin flips. We know that in our sample we see that the proportion of heads is 1/12. We call this a statistic. A statistic is just a function of the data. Mathematically we write
\(\hat{p} = 1/12\)
In words, our “sample-p”, often called “p-hat” ( \(\hat{p}\)) is 1/12.
Now we ask ourselves: “Is \(\hat{p}\) so different from our assumption that we have to reject our assumption as true?”
First, we need to know what 12 coin flips would have looked like “under the null”. In other words, “if the null was actually true”.
set.seed(1)
null_heads = rbinom(10000, 12, 0.5)
null = as.data.frame(null_heads)
Exercise 1: create a new column p
that is the proportion of heads in each row and create a histogram with bins = 10
To answer this, the alternative hypothesis matters. Our options are:
As we’ll see in a minute, choice of hypothesis affects our conclusions. For this reason, one should choose an alternative hypothesis before looking at the data.
Exercise 2: Find three p-values corresponding to each alternative:
# code here
According to a National Geographic poll
Fifty-two percent of Americans prefer dogs, 21 percent prefer cats, and 27 percent aren’t sure which species they like better.
For the following practice exercises, let’s making the horribly simplifying assumption that the 27 percent that could not decide are evenly split. In other words, assume
Last week you completed a survey on whether you prefer cats or dogs (but you could only pick one). Let’s load that survey data in below:
survey = read_csv("https://sta199-sp22-003.netlify.app/class_data/sta199-sp22-form.csv")
Let’s see if we believe that 65.5% of Americans prefer dogs to cats.
Exercise 3: To begin, write down a null and alternative hypothesis. (You can choose the alternative)
Next, find the proportion of people that prefer dogs in our data set.
# code here
Exercise 4
Simulate, using rbinom
, 10 thousand datasets (of equal size) under the null, i.e. assuming the null is true. Save your result inside a data frame.
set.seed(4)
# code here
Is a dog outcome a “success” (1) or “failure” (0) based on how you set up your simulation?
Exercise 5
Find the p-value using your observed statistic and the null distribution above. Do you reject or fail to reject the null?
# code here