Lab #02: Data Visualization

due Monday January 24 11:59 PM

Goals

Getting started

Go to https://github.com/sta199-s22 and find your lab02 repo. You must be logged in to GitHub! Let your TA know if you do not have a lab02 repo, and send your github username to Prof. Fisher via slack/email.

You will write your answers in the lab02.Rmd template file.

Update the YAML header with your name and today’s date. Then, knit the document and make sure the resulting PDF file has the correct date. Stage, commit, and push your changes.

Packages

We will be using tidyverse and viridis packages to make plots in R. Our data comes from the fivethirtyeight package.

library(tidyverse)
library(fivethirtyeight)
library(viridis)

Bechdel Test

In 2017, FiveThirtyEight published an article on the Bechdel Test, which is used to measure gender imbalance in movies. The test examines whether the movie has at least two female characters who have a conversation that is not about a man.

The data for this lab is inside the fivethirtyeight package in a dataset called bechdel.

All plots should follow best visualization practices; plots should include:

In addition, code and narrative should not exceed the 80 character limit. To help police this, add a vertical line at 80 characters by clicking “Tools” \(\rightarrow\) “Global Options” \(\rightarrow\) “Code” \(\rightarrow\) “Display”, then set “Margin Column” to 80 and click “Apply”.

Your assignment should have at least three meaningful commits.

  1. Make a histogram of the domestic gross in 2013 dollars. (There is a specific variable in the dataset for this with an intuitive name.) Please set the number of bins at 10. Please label your axes and give the plot a title. Does it appear that there are any outliers? Does the distribution appear to be symmetric or skewed?

Before going to Exercise 2, now would be a good time to do your first knit and then commit and push.

  1. Generate a scatterplot of a film’s budget in 2013 dollars (budget_2013) as your x variable versus 2013 domestic gross (domgross_2013) as your y variable with points colored by binary, which indicates if it passed the Bechdel Test. Please label your axes and legend and give the plot a title.

(Hint: if you would like to use the viridis palette, the line scale_color_viridis(discrete = TRUE, option = "C", name = "Passed Bechdel Test?") will be useful to your code.)

  1. Describe what you observe in the plot for Exercise 2. Do you observe the same pattern for movies that pass and those that do not?

  2. Now, examine the relationship between same two variables, with a separate plot for those that passed and those that did not, hint: use facet. Please label your axes and add an additional layer, a line of best fit without standard errors (i.e., please set se = FALSE). Which plot do you prefer? Briefly explain your choice.

Now would be a good time to knit, commit, and push, again.

  1. While it might seem straightforward as to whether a movie passes the test or not, sometimes it is less clear. The variable clean_test divides movies into five separate categories in terms of whether they pass. “Dubious” and “OK” indicate some degree of passage, while the other three categories indicate some degree of failure.

Here we’ll answer the question: “Does the international gross of a movie depend upon whether the movie passes the Bechdel Test”?

To find out, create side-by-side boxplots of a movie’s 2013 international gross (intgross_2013) for whether it passed the test (clean_test). Briefly comment on what you notice. Which categories have movies that have grossed above 3 billion dollars? How do you know based upon the plot?

  1. Has the percentage of movies that pass the Bechdel Test increased over time?

Please create a segmented bar chart with one bar per year, each bar going from 0 - 1, with the fill determined by whether the movie passed the Bechdel Test. What do you notice?

Before going to the last exercise, try knitting, committing, and pushing one more time.

  1. Recreate the plot below. This page will be helpful in determining the theme. The size of the points is 0.75. (Please note: the points are colored different shades of blue.)

Note: The y-axis represents domestic gross in 2013 dollars.

Submission

Once you are fully satisfied with your lab, Knit to PDF to create a PDF document.

Before you wrap up the assignment, make sure all documents are updated on your GitHub repo. we will be checking these to make sure you have been practicing how to commit and push changes.

Remember – you must turn in a PDF file to the Gradescope page before the submission deadline for full credit.

Once your work is finalized in your GitHub repo, you will submit it to Gradescope. Your assignment must be submitted on Gradescope by the deadline to be considered “on time”.

Grading

Total: 50 pts