By the end of today you will
pivot_wider()
and kable()
library(tidyverse)
library(knitr)
#sta199 <- read_csv()
For this Application Exercise, we will look at our newly collected data.
Data includes
year
: Year in schoolanimal
: Whether you prefer cats or dogstv
: Favorite TV genremajor
: probable major (statistical science or not)Give two examples of an event from the data set.
Let’s take a look at favorite TV genre. Note that we have categorized genres so that each person can only have one favorite genre.
# code here
# code here
How large is the sample space of any individual’s response? Can we check this in R
?
# code here
# code here
# code here
# code here
Now let’s make at table looking at the relationship between year and favorite tv.
# sta199 %>%
# count(year, tv)
We’ll reformat the data into a contingency table, a table frequently used to study the association between two categorical variables. In this contingency table, each row will represent a year, each column will represent a tv show, and each cell is the number of students have a particular combination of year and major.
To make the contingency table, we will use a new function in dplry
called pivot_wider()
. It will take the data frame produced by count()
that is current in a “long” format and reshape it to be in a “wide” format.
We will also use the kable()
function in the knitr
package to neatly format our new table.
# sta199 %>%
# count(year, tv) %>%
# pivot_wider(id_cols = c(year, tv),#how we identify unique obs
# names_from = tv, #how we will name the columns
# values_from = n, #values used for each cell
# values_fill = 0) %>% #how to fill cells with 0 observations
# kable() # neatly display the results
For each of the following exercises:
Calculate the probability using the contingency table above.
Then write code to check your answer using the sta199
data frame and dplyr
functions.
# code here
# code here
# code here
# code here
Population: the entire group you want to learn about. Often, it’s useful to think the population is “truth”
Sample: Your sample of the population from which you draw inference.
pivot_wider
and pivot_longer