Repo admin

From the first homework assignment, you should now have a Homework folder on your computer containing a subfolder HW1 with the first assignment. For the coming assignments, you will need data that is made available in the repo at https://github.com/MT4007-HT21/HW_data. Clone this repo by following the instructions detailed in the first Homework. Call this subfolder HW_data. Add (or open) the file that is called .gitignore in your repository and on a line, add HW_data. Save, commit and push the changes.

Summary of repos and folders

  • The Homework folder is connected to your HW_username repository on GitHub. When you want to push your work to GitHub, open the R-project in this folder and commit-push. It has a subfolder Homework/HW_data and one subfolder for each homework (Homework/HW1, Homework/HW2, …). It also contains the README.md-file where you insert links to your homeworks.
  • The Homework/HW_data folder is connected to https://github.com/MT4007-HT21/HW_data. When a new homework is issued, you might need to pull the most recent changes of this repository. Open the R-project in this folder and pull from GitHub. You should never change the files in this folder. If you do so by mistake, delete it and make a new clone.
  • The Homework/HW[1-6] folders. This is where you keep your rmarkdown and markdown document for each homework.

Deadline

Deadline for the homework is 2021-11-14 at 23.59. Submission occurrs as usual by creating a new issue with the title “HW2 ready for grading!” in your repository. Your peer review will be assigned on 2021-11-15 and is due 2021-11-16 at 12:00.

HW instructions

Solutions to the following tasks should be presented in an R-Markdown document with output: github_document. Both the R-Markdown document (.Rmd-file) and the compiled Markdown document (.md file), as well as any figures needed for properly rendering the Markdown file on GitHub needs to be pushed as part of the HW2 subdirectory. Code should be written clearly in a consistent style, see in particular Hadley Wickham’s tidyverse style guide. As an example, code should be easily readable and avoid unnecessary repetition of variable names.

Your submitted code should be self-contained and results should reproducible for someone having access to the HW_data directory. Once you are ready to submit and before the deadline, use the same procedure as for HW1: open an issue in your HW_<username> repository with the title “HW2 ready for grading!”.

Exercise 1: Apartment prices

The file HW_data/booli_sold.csv contains sales data on 158 apartments in Ekhagen (next to Lappis) collected from Boolis open API.

Tasks

  1. Illustrate how Soldprice depends on Livingarea with a suitable figure.
  2. Illustrate trends in Soldprice / Livingarea over the period.
  3. Illustrate an aspect of data using a boxplot (geom_boxplot).

Exercise 2: Folkhälsomyndigheten COVID cases and why excel might not be your friend

The file HW_data/Folkhalsomyndigheten_Covid19.xlsx contains data on COVID cases in Sweden. The data was obtained through Folkhälsomyndigheten’s webpage on the 1st of October 2020. Due to the fact that we downloaded it manually on a specific date, reproducability might be an issue since COVID cases might be updated.

Tasks

Answer the listed questions below.

Data wrangling

  1. Open the .xlsx file in any way of your choosing and have a look at the numbers. What does the file contain and what does the data represent? Use the Folkhälsomyndigheten’s webpage to gather the necessary information. Declare what information is in which sheet. Depending on your OS and how you opened the file to begin with, you might get some info of the sheets using the function excel_sheets.
  2. From the readxl package, use an appropriate read_* function to read all sheets in the .xlsx file and store them as tibbles (data.frames). The read_*function will be simply referred to as the “read function” in the coming questions. When you read these sheets, you should see a lot of warning messages. We will investigate those in the coming questions.
  3. Display the first and the last five rows of the second sheet called “Antal avlidna per dag” using knitr::kable and head. What are the column names? Does anything seem strange? Using the argument n_max in the read function, remove the last row.
  4. In the sheet corresponding to “Veckodata Kommun_stadsdel” look at the columns and their types. What type has your read_* function parsed for the column Statsdel? Read the documentation and the appropriate function, give an explanation to why this happens and how to fix it.
  5. In the same sheet there are two columns called tot_antal_fall and nya_fall_vecka. What is the type of these variables and why has it been parsed as such? Correct these (in some way) such that these become numeric variables.

Statistics and plotting

  1. Using the summarise, across and where function, reproduce the number of COVID cases for each region as well as for the total (here named Totalt_antal_fall). Notice that this information can be found in another part of the sheet. What is the total numer of cases? Which region has had most cases so far? Which has had the least? Argue why looking at counts might be misleading when comparing regions.
  2. Plot the total number of diseased people since the 15th of March in a line chart.
  3. Plot tot_antal_fall and nya_fall_vecka against their corresponding week number using geom_col.
  4. How many rows are there in the dataframe used to produce the figures in c.? Compare that to the number of weeks/columns plotted in c. What did ggplot do without telling us? Hint: Read the documentation.

Peer review

After deadline has passed, you will be given access to another students repository on GitHub. You should provide summary feedback by responding to the “HW2 ready for grading!” issue. Copy the following checklist and use it in your review:

* Is the homework complete, e.g. are all questions in the homework answered?

* Is there a working link from the main repository `README.md` to `HW2.md`?

* Is any code showing? If yes, is there any text about it?

* Do the Figures have proper axis names, do you understand them?

* Did you get any different results in your submission?