Homework #01: Data Visualization

due Friday, January 28 11:59 PM

Goals

For this assignment you must have at least three commits and all of your code chunks must have meaningful names.

To begin, update your author name in the YAML header of the template R Markdown file. Commit and push.

Clone assignment repo + start new project

A private repository including a template R Markdown file has been created for you.

Packages

We will work with the tidyverse package as usual. We will also use viridis and the ggridges packages.

library(tidyverse)
library(viridis)
library(ggridges)

Data

anes <- read_csv("data/anes2020_subset.csv")

The data for this homework assignment comes from the 2020 American National Election Study.

A subset of variables are provided here. Some of them have been recoded, that is transformed into slightly new variables. Some variables you will recode during this lab to be able to carry out your analysis. The variables are as follows:

All plots should follow the best visualization practices discussed in lecture. Plots should include an informative title, axes should be labeled, and careful consideration should be given to aesthetic choices.

In addition, code and narrative should not exceed the 80 character limit. See the Lab #01 instructions for setting a vertical line at 80 characters in your R Markdown file.

  1. How many rows are in the anes dataset? How many columns? Be sure to include code and output to support your response.

Now would be a good time to do your first knit, commit, and push.

  1. Create a bar chart showing the ideology of the respondents, with the count on the y-axis. Be sure to include labels. What is the most common ideology? Do respondents tend to be moderate or more ideologically extreme?

  2. Now, let’s examine whether ideologies are different based upon where people live. Make a filled bar plot, showing one bar for each ideology, with the percentage of respondents on the y-axis going from 0-1, and the fill determined by urbanrural. You are encouraged but not required to use viridis colors. Remember to include labels.

Where do people of different ideologies tend to live? Does the percentage of non-responses (i.e., people who said NA) vary much by ideology?

Now would be a good time to knit, commit, and push again.

  1. How do people view scientists? Create a bar plot with the ‘feeling thermometer’ on the x-axis and the number of respondents on the y-axis. Comment on features of the histogram such as skewness and peaks.

  2. Does the ideology of those who have gone hunting or fishing in the past year differ from those who haven’t? To investigate, create side-by-side boxplots of these two groups.

You should start your code with:

anes %>%
  drop_na(hunt_fish) %>%
  mutate(hunted_fished =
           ifelse(hunt_fish == 0,
                  "Did Not Hunt or Fish",
                  "Hunted or Fished"))

Note that drop_na removes observations that are NA for the hunt_fish variable.

Then construct side-by-side ridgeline density plots using geom_density_ridge().

See the lecture notes and the ggridges vignette for more information and example code.

Describe what you observe in both plots and what you learn from one plot that you do not see in the other or that adds additional context to the other.

  1. Is there a relationship between a respondent’s education level and how they view scientists? Create a scatterplot of these two variables with education on the x-axis and view of scientists on the y-axis. Then, include a line of best fit with the option method = "lm". Do you think this is an especially useful visualization? Why or why not?

Now would be a good time to knit, commit, and push again.

  1. For this problem, you should again make a scatterplot where you look at the relationship between education as the x-variable and respondents’ views of scientists as the y-variable.

There are a lot of data points in this dataset. For this exercise, you are going to begin by taking a sample, using the code below:

set.seed(18)
anes2 <- anes %>%
  sample_frac(.10)

This code takes a random subset of the dataset– including set.seed makes sure that it is the same subset each time.

Make a scatterplot using this subset and facet by whether the person hunted or fished in the past year. Include labels in words identifying which group the subplot represents.

Next add a geom_smooth() layer with method = lm for each plot and add the argument se = FALSE to omit the bands surrounding the line. Describe what you observe.

Submission

Knit to PDF to create a PDF document. Stage and commit all remaining changes, and push your work to GitHub. Make sure all files are updated on your GitHub repo.

Only upload your PDF document to Gradescope. Before you submit the uploaded document, mark where each answer is to the exercises. If any answer spans multiple pages, then mark all pages. Associate the “Overall” section with the first page.

Grading

Total: 50 pts.