class: center, middle, inverse, title-slide # Project Proposal ### Spring 2022 --- ## Project proposal steps 1. Find a data set that satisfies the guidelines [click here for guidelines and some sites with data](https://sta199-sp22-003.netlify.app/project/project.html) 2. Write about: - the source of data - when and how it was originally collected (by the curator, not necessarily how you found the data) - a brief description of the observations 3. Choose 1-2 research questions 4. `glimpse` the data --- ## Ex: Introduction and Data Data set #1: NC Courage Homefield Advantage Our first data set comes from the [National Women's Soccer League (NSWL) Github](https://github.com/adror1/nwslR) and was sourced from [nwslsoccer.com](https://www.nwslsoccer.com/). The dataset contains 78 observations (soccer games) played by the NC courage spanning three seasons: 2017, 2018, 2019. There are 10 variables in this dataset. Some of the variables we care about are `home_team`, `away_team`, and `result` (of the game). --- ## Ex: Research question(s): Does NC Courage have a home-field advantage? We hypothesize that NC Courage is more likely to win on their home field than another team's field. - To answer this question we will use information about the `home_team`, and the `result` of the game. Does winning propagate winning? When NC Courage win a game, does it increase the probability of winning the very next game? - To answer this question we will use information about the `result` of the game and the `game_number`. --- ## Ex: Glimpse ```r glimpse(courage) ``` ``` ## Rows: 78 ## Columns: 10 ## $ game_id <chr> "washington-spirit-vs-north-carolina-courage-2017-04-15", … ## $ game_date <chr> "4/15/2017", "4/22/2017", "4/29/2017", "5/7/2017", "5/14/2… ## $ game_number <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,… ## $ home_team <chr> "WAS", "NC", "NC", "BOS", "ORL", "NC", "NC", "CHI", "NC", … ## $ away_team <chr> "NC", "POR", "ORL", "NC", "NC", "CHI", "NJ", "NC", "KC", "… ## $ opponent <chr> "WAS", "POR", "ORL", "BOS", "ORL", "CHI", "NJ", "CHI", "KC… ## $ home_pts <dbl> 0, 1, 3, 0, 3, 1, 2, 3, 2, 3, 0, 0, 2, 1, 1, 0, 1, 2, 2, 2… ## $ away_pts <dbl> 1, 0, 1, 1, 1, 3, 0, 2, 0, 1, 1, 1, 0, 0, 0, 1, 2, 0, 3, 1… ## $ result <chr> "win", "win", "win", "win", "loss", "loss", "win", "loss",… ## $ season <dbl> 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017… ```