Instructions

Solutions to the exercises of this homework 5 should, just as for HW1-HW4, be written in an R-Markdown document with output: github_document. Both the R-Markdown document (.Rmd-file) and the compiled Markdown document (.md file), as well as any figures needed for properly rendering the Markdown file on GitHub, should be uploaded to your Homework repository as part of a HW4 folder. Code should be written clearly in a consistent style, see in particular Hadley Wickham’s tidyverse style guide. As an example, code should be easily readable and avoid unnecessary repetition of variable names.

Note that there are new data-sets available in the HW_data repository. Downloading them by opening the associated R-project and issue a “pull”. If it fails, delete the HW_data folder on your computer and clone the repository again according to the instructions in HW2.

Deadline

Deadline for the homework is 2021-12-05 at 23.59. Submission occurrs as usual by creating a new issue with the title “HW5 ready for grading!” in your repository. Please also add a link from your repository’s README.md file to HW5/HW5.md. Your peer review will be assigned on 2021-12-06 and is due 2021-12-08 at 12:00.

Exercise 1: Lööf vs Löfven

The file ../HW_data/LoofLofvenTweets.Rdata contains tables Loof and Lofven of tweets during the period from 2018-11-20 to 2018-11-30 mentioning “Lööf” and “Löfven”, respectively. The data were fetched from the Twitter API using the R package rtweet, which provides a convenient R access point to the twitter API. Load the data using the R function load.

Tasks

  1. Construct a table tweets that joins the two tables and contains a variable Person identifying whether the observation comes from the “Lööf” of “Löfven” table. Tweets common to both tables should not be included in the join.

  2. Illustrate how the intensity of tweets containing the word “statsminister” (or “Statsminister”) has evolved in time for the Person:s using, e.g., histograms with time on the x-axis.

  3. Compute and plot the daily average sentiment of words in the tweet texts for the two Person:s. We define the average sentiment as the average strength of words common to the text and the sentiment lexicon at https://svn.spraakdata.gu.se/sb-arkiv/pub/lmf/sentimentlex/sentimentlex.csv. Note that the function separate_rows can be useful in splitting the text into words.

Exercise 2: Nobel API v3

The 2021 Nobel lectures takes place on 6–12 December. The Nobel foundation even maintains an API to look up information about the Nobel Laureates. We are going to use version 2 of this API.

Tasks:

  1. Fetch a list in JSON format with information on the Nobel prizes in Literature from The Nobel Prize API version 2. Choose a range of years to fetch data for. The API follows the OpenAPI standard and the documentation can be found here. A large part of this question is to figure out how to read and work with the OpenAPI documentation.

  2. Extract all the prize motivations from the JSON-list, convert into a character vector of words, remove stop words and visualize the frequencies of remaining words in a word-cloud. R-packages for plotting word clouds include e.g. wordcloud, wordcloud2 and ggwordcloud and a list of stop words can be fetched by

stop_words_url <- "https://raw.githubusercontent.com/stopwords-iso/stopwords-en/master/stopwords-en.txt"
stopwords <- read_table(stop_words_url, col_names = "words")

Peer review

After deadline has passed, you will be given access to another students repository on GitHub. You should provide summary feedback by responding to the “HW5 ready for grading!” issue. Copy the following checklist and use it in your review:

* Is the homework complete, e.g. are all steps in the homework done?

* What type of joins where used in question 1? Do they differ to your choice?

* Is the sentiment on average negative or positive for Lööf/Löfven? Are there any values above or below 2/-2 in the Figure of exercise 1?

* What type of join was used Exercise 2?