From the first homework assignment, you should now have a Homework folder on your computer containing a subfolder HW1
with the first assignment. For the coming assignments, you will need data that is made available in the repo at https://github.com/MT4007-HT21/HW_data. Clone this repo by following the instructions detailed in the first Homework. Call this subfolder HW_data
. Add (or open) the file that is called .gitignore
in your repository and on a line, add HW_data
. Save, commit and push the changes.
HW_username
repository on GitHub. When you want to push your work to GitHub, open the R-project in this folder and commit-push. It has a subfolder Homework/HW_data
and one subfolder for each homework (Homework/HW1
, Homework/HW2
, …). It also contains the README.md-file where you insert links to your homeworks.Homework/HW_data
folder is connected to https://github.com/MT4007-HT21/HW_data. When a new homework is issued, you might need to pull the most recent changes of this repository. Open the R-project in this folder and pull from GitHub. You should never change the files in this folder. If you do so by mistake, delete it and make a new clone.Homework/HW[1-6]
folders. This is where you keep your rmarkdown and markdown document for each homework.Deadline for the homework is 2021-11-14 at 23.59. Submission occurrs as usual by creating a new issue with the title “HW2 ready for grading!” in your repository. Your peer review will be assigned on 2021-11-15 and is due 2021-11-16 at 12:00.
Solutions to the following tasks should be presented in an R-Markdown document with output: github_document
. Both the R-Markdown document (.Rmd-file) and the compiled Markdown document (.md file), as well as any figures needed for properly rendering the Markdown file on GitHub needs to be pushed as part of the HW2 subdirectory. Code should be written clearly in a consistent style, see in particular Hadley Wickham’s tidyverse style guide. As an example, code should be easily readable and avoid unnecessary repetition of variable names.
Your submitted code should be self-contained and results should reproducible for someone having access to the HW_data
directory. Once you are ready to submit and before the deadline, use the same procedure as for HW1: open an issue in your HW_<username>
repository with the title “HW2 ready for grading!”.
The file HW_data/booli_sold.csv
contains sales data on 158 apartments in Ekhagen (next to Lappis) collected from Boolis open API.
geom_boxplot
).The file HW_data/Folkhalsomyndigheten_Covid19.xlsx
contains data on COVID cases in Sweden. The data was obtained through Folkhälsomyndigheten’s webpage on the 1st of October 2020. Due to the fact that we downloaded it manually on a specific date, reproducability might be an issue since COVID cases might be updated.
Answer the listed questions below.
excel_sheets
.readxl
package, use an appropriate read_*
function to read all sheets in the .xlsx file and store them as tibbles (data.frames). The read_*
function will be simply referred to as the “read function” in the coming questions. When you read these sheets, you should see a lot of warning messages. We will investigate those in the coming questions.knitr::kable
and head
. What are the column names? Does anything seem strange? Using the argument n_max
in the read function, remove the last row.read_*
function parsed for the column Statsdel
? Read the documentation and the appropriate function, give an explanation to why this happens and how to fix it.tot_antal_fall
and nya_fall_vecka
. What is the type of these variables and why has it been parsed as such? Correct these (in some way) such that these become numeric variables.summarise
, across
and where
function, reproduce the number of COVID cases for each region as well as for the total (here named Totalt_antal_fall
). Notice that this information can be found in another part of the sheet. What is the total numer of cases? Which region has had most cases so far? Which has had the least? Argue why looking at counts might be misleading when comparing regions.tot_antal_fall
and nya_fall_vecka
against their corresponding week number using geom_col
.After deadline has passed, you will be given access to another students repository on GitHub. You should provide summary feedback by responding to the “HW2 ready for grading!” issue. Copy the following checklist and use it in your review:
* Is the homework complete, e.g. are all questions in the homework answered?
* Is there a working link from the main repository `README.md` to `HW2.md`?
* Is any code showing? If yes, is there any text about it?
* Do the Figures have proper axis names, do you understand them?
* Did you get any different results in your submission?