library(tidyverse)
library(tidymodels)
report.rmd
and report.pdf
By the end of today, you will…
We’ll continue examining the Palmer penguin dataset.
data(penguins)
Use ?penguins
or click here for more info about the dataset.
penguins %>%
ggplot(aes(x = bill_length_mm, y = body_mass_g, color = island)) +
geom_point() +
theme_bw() +
geom_smooth(method = 'lm', se = F) +
labs(x = "Bill length (mm)", y = "Body mass (g)", title = "Body mass vs bill length by island", color = "Island")
main_fit = linear_reg() %>%
set_engine("lm") %>%
fit(body_mass_g ~ bill_length_mm + island, data = penguins)
main_fit %>%
tidy()
The associated linear model:
\[ y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 \]
What about Biscoe island?
interaction_fit = linear_reg() %>%
set_engine("lm") %>%
fit(body_mass_g ~ bill_length_mm * island, data = penguins)
interaction_fit %>%
tidy()
What changed in the code?
What does the full model look like?
Interpret the bill_length_mm:islandDream
slope.
Before examining the numeric value, which model do you think has the larger \(R^2\)? Why?
Compare adjusted \(R^2\) between models. Which model do you prefer?
# code here
More predictors, means higher \(R^2\). For this reason, use adjusted \(R^2\) when comparing models with variable number of predictors.
Adjusted \(R^2\) penalizes the number of predictors in the model. Therefore, adjusted \(R^2\) decreases unless the new variable helps explain the response.
Adjusted \(R^2\):
\[ 1 - (1 - R^2) \frac{n-1}{n - k - 1} \]
where \(n\) is the number of observations (in the data) and \(k\) is the number of predictors (in the model).
What’s linear about linear regression? The coefficients. We can transform the data in any way we like.
example = read_csv("data/example.csv")
example %>%
ggplot(aes(x = x, y = y)) +
geom_point() +
theme_bw() +
geom_smooth(method = 'lm', se = FALSE, color = 'steelblue') +
labs(x = "X", y = "Y", title = "Naive linear model is a bad fit")
# code here
What function better describes the relationship between \(x\) and \(y\) above?
Transform the predictor \(x\) and plot y versus the transformed predictor below.
# code here
Is the relationship between Body mass (g) and Bill depth (mm) positive or negative? Create a convincing argument from the data.
Create a linear model of body mass using bill depth and one other predictor of your choosing.
Do you prefer this model to the interaction effects model from exercise 2? Why?