Solutions to the exercises of this homework 3 should, just as for HW1-HW2, be written in an R-Markdown document with output: github_document
. Both the R-Markdown document (.Rmd-file) and the compiled Markdown document (.md file), as well as any figures needed for properly rendering the Markdown file on GitHub, should be uploaded to your Homework repository as part of a HW3
folder. Code should be written clearly in a consistent style, see in particular Hadley Wickham’s tidyverse style guide. As an example, code should be easily readable and avoid unnecessary repetition of variable names.
HW_data
and pull the most recent changes. If this does not work, delete the folder and clone a new version through an R project.Deadline for the homework is 2021-11-21 at 23.59. Submission occurrs as usual by creating a new issue with the title “HW3 ready for grading!” in your repository. Your peer review will be assigned on 2021-11-22 and is due 2021-11-24 at 12:00.
In the file HW_data/seo_marketing_data.sqlite
we have an example of a database often seen in search engine marketing. It contains data on who buys on that site (conversions). The data has been simplified a lot, but you can imagine it being from a site that rents out rooms for hotels. The database represents customers coming from a search engine (e.g. google), pressing an ad (a “click”) and whether or not they bought something (if a conversion occurred). There are two tables, “Clicks” and “Conversion_value”. Each id
represent a unique customer. Use the following code snippet to read tables from the database.
library(RSQLite)
con <- dbConnect(RSQLite::SQLite(), "HW_data/seo_marketing_data.sqlite")
# List all available tables
dbListTables(con)
# Tables are "Conversion value" and "Clicks"
# Pull all data in a table to R.
df_clicks <- dbReadTable(con, "Clicks")
lubridate::wday
.julianday
.Date of Click | CL - 25 day | CL - 26 days | CL - 27 days | … |
---|---|---|---|---|
2020-.. | a number | number | …. | … |
2020-.. | a number | number | …. | … |
2020-.. | more numbers | … | …. | … |
The two tables should contain the sum of clicks and average conversion value. Display the last five rows with knitr::kable
. The tables will be sparse (many zeros). Hint: Use a suitable function to make the data in wide format.
The file HW_data/sl-api.sqlite
is a SQLite database containing all SL’s current stops, lines, and the stops on each line. The data are obtained from a call to TrafikLab’s SL Hållplatser och Linjer 2 API on 2019-11-17 using the httr
package. Note: Trafiklab has further documentation of the API and the variables.
Figure out and describe how the tables relate to eachother. Note: Answer the question by both studying the descriptive information on the linked Webseite and, to a smaller degree, by manually looking at the data from the NoSQL-Database file from the HW_data
folder. The later could, e.g., be useful to confirm what is written in the description. At the end, your answers is supposed to consists purely of text and possibly a figure, but no code.
Pull all data from the SQLite database into R. Note: The SiteId
, StopPointNumber
and StopAreaNumber
variables are all of the type integer
according to the data model, but have not been properly converted as part of the API call. As a consequence, convert them manually to numerical values after importing them into R.
Present a table of the number of active unique rail traffic stops (i.e. train, tram or metro stops in each ticket zone (ZoneShortName
in stopAreas
/stopPoints
). By “active” we mean stops that are part of the journey pattern of a line.
Choose a line, and plot the stops as points on a map with the name of each stop as a label. Write the code in such a way that it is easily reusable if you want to plot another line. In order to produce a map use the latitude and longitude coordinates and generate the plot using leaflet
package. As an example:
library(leaflet)
kräftriket <- data.frame(lon = 18.055, lat = 59.357, name = "Kräftriket")
m <- leaflet(kräftriket) %>%
addTiles() %>% # Add default OpenStreetMap map tiles
addMarkers(lng=~lon, lat=~lat, popup = ~name)
m
For documents with output: html_output
or when working with the code in interactive mode you can then just print m
to see the map. This might, however, not work with .md
files (i.e. github_document) though! Instead, you can store the Leaflet input into a file and then read and display this again with knitr::includegraphics
:
#Store leaflet widget as a PNG, see https://stackoverflow.com/questions/31336898/how-to-save-leaflet-in-r-map-as-png-or-jpg-file
library(mapview)
mapshot(m, file = "Leaflet-plot.png")
knitr::include_graphics("Leaflet-plot.png")
stopAreas
and stopPoints
tables. What do the two tables have in common, e.g., are there any (unecessary) redudancies/columns? If so, which? Suggest another way to structure the two tables so that we dont loose any information but reduces the size of the database. Hint: Try using two different tables and then use joins to get back to the structure of stopAreas
and stopPoints
. A very operational way to check memory consumption for storing a variable in R is to use the object.size
function or the pryr::object_size
function.After deadline has passed, you will be given access to another students repository on GitHub. You should provide summary feedback by responding to the “HW3 ready for grading!” issue. Copy the following checklist and use it in your review:
* Is the homework complete, e.g. are all questions in the homework answered?
* Is any code showing? If yes, is there any text about it?
* Do the Figures have proper axis names, do you understand them?
* When did the most clicks occur, Sunday or Monday?
* Does the table show the last 5 days of the conversion lag and 5 rows?
* What type of join was used in 2 e?