Instructions

Solutions to the exercises of this homework 7 should, just as for HW1-HW6, be written in an R-Markdown document with output: github_document. Both the R-Markdown document (.Rmd-file) and the compiled Markdown document (.md file), as well as any figures needed for properly rendering the Markdown file on GitHub, should be uploaded to your Homework repository as part of a HW7 folder. Code should be written clearly in a consistent style, see in particular Hadley Wickham’s tidyverse style guide. As an example, code should be easily readable and avoid unnecessary repetition of variable names.

Note that there are new data-sets available in the HW_data repository. Downloading them by opening the associated R-project and issue a “pull”. If it fails, delete the HW_data folder on your computer and clone the repository again according to the instructions in HW2.

Deadline

This homework exercise corresponds to the re-examination of the homework part of the HT2021 course for students who particpated actively in the submissions of homeworks during the semester, but for various reasons did not pass one or several of their homework. Who has to which exercises depends sligtly on what homework was missed and is specified individually. All who followed the protocol should have instructions in their “Participating in HW7?” issue on what they need to do.

Deadline for this re-examination homework is 2022-03-31. Submission occurs as usual by pushing into your homework repository followed by a new issue with the title “HW7 ready for grading” in your repository or re-use the above mentioned issue. The date of passing the homework part, if you pass this additional homework, will be the date you raised your HW7 issue. Submissions beyond the March deadline are not accepted. Please also add a link from your repository’s README.md file to HW7/HW7.md.

Exercise 1 - SQL and baseball statistics.

In the database HW_Data/baseball.sqlite there is statistics and information on baseball teams, players and many other things relating to the two. In this question we expect you to answer with SQL queries unless otherwise stated. Below you can find code to connect to the dabatase.

db <- dbConnect(RSQLite::SQLite(), file.path("../HW_data/baseball.sqlite"))
dbListTables(db)
  1. The table AwardsPlayers contains data for different different awards over the years. A college of yours presents the following query. Describe in words what information they aim to extract and what steps the query takes to accomplish it.
" SELECT df.awardID, df.yearID, MAX(df.n), MIN(df.n) FROM (
    SELECT awardID , yearID, COUNT(*) as n FROM AwardsPlayers
    WHERE yearID >= 1999
    GROUP BY awardID, yearID
  ) as df
  GROUP BY df.awardID
"

where awardID is the ID (name) of the award, yearID is the year the award was given.

  1. The Teams table contains yearly stats and standings for each baseball team. Has any team had more than one park during any given year?

  2. What team and what player has had the largest increase in salary? The player and team salaries can be found in the Salaries table. Each individual is described by a playerID and each team by a teamID. The players name is found in the People table.

Exercise 2 - Fun & random facts but with proper pronounciation!

In this question we are going to extract random, fun or useless facts which are then to be transformed to phonetic transcripts! We are going to use this API to get the facts. A key feature of this API is that they are random, so there is a probability that you will get the same result twice.

  1. Get the Facts! Use the /random endpoint to get 5 different useless facts in english. Your code must assert that the facts are different (in some way). A useful starting point is to mock the data that you can receive. Given that you use specify a json response in english from the API a “mocked” response might look something like the following printed in the R console
$id
[1] "1"

$text
[1] "This is a little test"

or another response

$id
[1] "2"

$text
[1] "This is a second test which has another fact."

Any of the responses can be thought of as a response from a GET and the content function. Your code must make sure that if you get the first response twice, then you need to throw one away and get another one.

  1. Use the following API to try to translate any word longer than 4 letters to phonetic transcript. You might not be able to translate some words which the code needs to take care of. You may choose how to (or not to) handle words that contain or are in conjuction to special characters, e.g. “test:”, “'” or “é”. Display your facts with and without phonetic transcriptions.