Lab 8: Dynamic Documents
๐งโ๐ป Make sure to finish previous labs before continuing with this lab! It might be tempting to skip some stuff if youโre behind, but you may just end up progressively more lost โฆ we will happily answer questions from any lab each week ๐
5.11 Covid-19 reporting
One of the few โsuccess storiesโ of the pandemic was the way that the UK Health Security Agency rapidly made available some of the best monitoring data in the world to all researchers via a new UKHSA data dashboard. This dashboard still exists and has been expanded to cover more diseases the UK Government need to monitor, right from measles through to clostridioides difficile.
Covid-19 now primarily represents a risk to those with existing serious health conditions, so although monitoring continues it is less detailed. So for the lab today weโll use some of the historic data which was very detailed, covering the period from 19 March 2020 through to 16 November 2021. This data provides a great example of managing large real world data which is not in precisely the format you need for analysis, so we will get to cover lots of the data science pipeline: data import, tidying, visualisation, and reporting.
For this toy scenario it is late 2021 and you have landed a job as a data scientist reporting to Prof Sir Chris Whitty, Chief Medical Officer (CMO) for England, and Prof Sir Patrick Vallance FRS, Government Chief Scientific Adviser (GCSA), at a time when monitoring is rapidly updated.
Your first role in the job is to create a new RMarkdown document in which you will produce a mini report for the CMO and GCSA on the state of Covid-19. You need to create this in a way which can be rapidly and automatically updated when the data refreshes every week.
Exercise 5.75 I have created an archive of the data at https://www.louisaslett.com/Courses/DSSC/data/c19.csv
Do not download the data, instead copy and paste the permanent link into a read.csv()
call so that each time you compile the document it downloads the latest data (we want to emulate the situation where your report downloads the data every time you compile it and would automatically get new information if it was being refreshed).
Place that read.csv()
in an R code block near the start of your report with the setting that prevents the code being printed to the document (you want the data loading to be โsilentโ).
Click for solution
We then add this link to a code block somewhere near the start of the document, so like this:```{r,echo=FALSE}
c19 <- read.csv('https://www.louisaslett.com/Courses/DSSC/data/c19.csv')
```
Exercise 5.76 Load the data above into your working environment by running the read.csv()
function in the console.
You will often find you experiment on the console first to figure out the functions you want to embed in your document.
Look at the data frame that youโve loaded.
By using
as.Date()
convert thedate
column from a string to a date type.By using the
ordered()
function (look at the help), convert theage
column from a string to an ordered factor.Hint 1: be careful, the default ordering will be alphabetical because
age
is a string โฆ is that ordering correct? If not, you will need to manually override the order by following the help file!Hint 2: Look at the
levels
argument toordered()
.
Once you have figured out these commands, add them immediately after the read.csv()
function in your hidden RMarkdown code block.
Hint: if you find unwanted warnings or messages produced, you can more agressively silence the spurious output by adding warning=FALSE,message=FALSE
to your code block options.
Click for solution
## SOLUTION
library("tidyverse")
read.csv("https://www.louisaslett.com/Courses/DSSC/data/c19.csv") c19 <-
# See what the ages in the data are
sort(unique(c19$age))
ages <- ages
[1] "0_to_5" "18_to_64" "6_to_17" "65_to_84" "85+"
# So, clearly we want to take these unique levels in the order 1,3,2,4,5 based on that vector
# Update the data as instructed in the question
c19 |>
c19 <- mutate(date = as.Date(date),
age = ordered(age, levels = ages[c(1,3,2,4,5)]))
# Check that the age is ordered and with the correct levels by looking at the first few with head:
head(c19$age)
[1] 85+ 18_to_64 65_to_84 0_to_5 6_to_17 85+
Levels: 0_to_5 < 6_to_17 < 18_to_64 < 65_to_84 < 85+
Exercise 5.77 Create an opening section and write a short sentence crediting the source of the data, including a working hyperlink to the official government UKHSA data dashboard at https://ukhsa-dashboard.data.gov.uk/.
Following that, create a graphic (using ggplot) which plots date
on the x-axis, rate
on the y-axis, colours the line by areaName
and facets by age
. Make sure the code does not print out (so it looks professional) and add some surrounding text to explain the graphic.
Is there a problem with this? By reading the documentation for facet_wrap
or otherwise, fix the problem.
Click for solution
The graphic you will add should use the following code (wrapped in an R code block withecho=FALSE
):
## SOLUTION
ggplot(c19, aes(x = date, y = rate, colour = areaName)) +
facet_wrap(~ age) +
geom_line()
The problem here is that the rate in different age groups varies a lot, so we may want to allow each facet panel to have different scales:
## SOLUTION
ggplot(c19, aes(x = date, y = rate, colour = areaName)) +
facet_wrap(~ age, scales = "free") +
geom_line()

Exercise 5.78 Create another hidden code block which uses the Tidyverse to summarise over age groups and produce the total daily rate in each region and save this in a new data frame.
Note: you need to be really careful! The rate is per 100,000 and the number of people in each age range in each region is not the same, so you canโt just sum the rates!
Hint: a rate of 8469.3 per 100k in 85+ in London corresponds to a total of 12914 admissions, but a rate of 3158.2 per 100k in 65โ84 in London corresponds to a total of 29846 admissions. So create a total
column calculating the population size for each row, then summarise to get the new rate.
Click for solution
Taking into account the note and hint, we have thatrate
/100000 = value
/total
, where total
is unknown. So we solve for total
first, then group before summarising to compute the total rate per 100k across age ranges.
Then, we add a new code block with the following:
## SOLUTION
c19 |>
c19.regions <- mutate(total = value*100000/rate) |>
group_by(date, areaName) |>
summarise(totalrate = sum(value)/sum(total)*100000)
`summarise()` has grouped output by 'date'. You can override using the
`.groups` argument.
Exercise 5.79 Using this new data frame, create a new graphic in your report with ggplot which shows the total rate over time with line colour indicating the region.
Add the option warning=FALSE
to your block in RMarkdown to avoid the warning about missing values (these are from the start of the pandemic before there were any admissions in some regions).
Click for solution
You will add a new code block with the following:## SOLUTION
ggplot(c19.regions, aes(x = date, y = totalrate, colour = areaName)) +
geom_line()
Warning: Removed 34 rows containing missing values or values outside the scale
range (`geom_line()`).

Exercise 5.80 Add another hidden code block which subsets the full data to retain only the last 7 days available in the data, in only the "North East and Yorkshire"
region. Call the data frame recent_rates_ne
.
Note: do not use a fixed date to find the last 7 days in the data: it must be automatically calculated. In other words, if you recompile your document a week from now, it should use whatever is the last 7 days at that time without you changing anything.
Click for solution
## SOLUTION
# First figure out the most recent 7 days
# This gets the unique dates, sorting them most recent to oldest
sort(unique(c19$date), decreasing = TRUE)
all_dates <-# Then, the date 7 days ago will be in all_dates[7]
# First subset:
c19 |>
recent_rates_ne <- filter(areaName == "North East and Yorkshire",
>= all_dates[7]) date
Exercise 5.81 Add to your hidden code block.
By using pivot_wider
in the Tidyverse or otherwise, create a data frame recent_rates_ne_wide
like this (these are the most recent 7 days at the time of writing):
# A tibble: 7 ร 6
date `0_to_5` `6_to_17` `18_to_64` `65_to_84` `85+`
<date> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2021-11-16 170. 90.9 616 2482. 7746
2 2021-11-15 170. 90.7 615. 2478. 7739.
3 2021-11-14 169. 90.6 614. 2475 7734.
4 2021-11-13 168. 90.3 613. 2471. 7724.
5 2021-11-12 168. 90 612. 2468. 7716.
6 2021-11-11 168. 89.9 611. 2464. 7709.
7 2021-11-10 167. 89.8 610. 2460. 7700.
Hint: if the columns are in a funny order on your first attempt, you might find names_sort = TRUE
a useful argument for pivot_wider
.
Click for solution
## SOLUTION
pivot_wider(recent_rates_ne |> select(date, age, rate),
recent_rates_ne_wide <-names_from = "age",
values_from = "rate",
names_sort = TRUE)
# Looks right?
recent_rates_ne_wide
# A tibble: 7 ร 6
date `0_to_5` `6_to_17` `18_to_64` `65_to_84` `85+`
<date> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2021-11-16 170. 90.9 616 2482. 7746
2 2021-11-15 170. 90.7 615. 2478. 7739.
3 2021-11-14 169. 90.6 614. 2475 7734.
4 2021-11-13 168. 90.3 613. 2471. 7724.
5 2021-11-12 168. 90 612. 2468. 7716.
6 2021-11-11 168. 89.9 611. 2464. 7709.
7 2021-11-10 167. 89.8 610. 2460. 7700.
Exercise 5.82 As a final line in the hidden code block weโve been building up, add the following code:
::kable(recent_rates_ne_wide) knitr
This is a function which automatically creates an HTML table from a data frame, so will do a nicely formatted table of the most recent 7 day totals for the North East and Yorkshire.
Click for solution
Once you put it in the table should look like this:## SOLUTION
::kable(recent_rates_ne_wide) knitr
date | 0_to_5 | 6_to_17 | 18_to_64 | 65_to_84 | 85+ |
---|---|---|---|---|---|
2021-11-16 | 169.9 | 90.9 | 616.0 | 2481.5 | 7746.0 |
2021-11-15 | 169.5 | 90.7 | 615.1 | 2477.8 | 7739.4 |
2021-11-14 | 169.2 | 90.6 | 613.9 | 2475.0 | 7733.8 |
2021-11-13 | 168.5 | 90.3 | 613.1 | 2471.4 | 7723.9 |
2021-11-12 | 168.5 | 90.0 | 611.9 | 2468.5 | 7716.4 |
2021-11-11 | 167.6 | 89.9 | 610.7 | 2464.2 | 7708.9 |
2021-11-10 | 167.1 | 89.8 | 609.7 | 2460.1 | 7699.5 |
This is a good point to check that your report is successfully compiling in HTML, PDF and Word (if available). Make sure there are no ugly bits of code or warning messages showing up.
To see the solution RMarkdown file, download here. To see the solution HTML file which that compiles to, download here.
Exercise 5.83 Prof Whitty and Prof Vallence are impressed with your work and want to roll it out to the official UK Government Website https://www.gov.uk/, so you need to update the styling to match correctly.
To do this, install the govdown
package:
install.packages("govdown")
You can see the documentation for this package here: https://ukgovdatascience.github.io/govdown/
Next, replace all of the document preamble by copy-pasting the following in place instead (this should be the whole preamble now):
---
output:
govdown::govdown_document:
font: "sans-serif" # (default) or "new-transport"
logo_url: "https://ukgovdatascience.github.io/govdown/"
logo_text: "GOV.UK"
page_title: "My Covid Dashboard"
title: "My Covid Dashboard"
phase: alpha
---
Recompile your document and ensure it looks like the GOV.UK website.
Click for solution
To see the solution RMarkdown file in GOV.UK style, download here. To see the solution HTML file in GOV.UK style which that compiles to, download here.Well done! If Dominic Cummings was still at No.10, maybe you would have been rewarded with lifetime free entry to Klute! (It was a close shave, but apparently is reopened!) ๐๐ชฉ๐บ
๐๐ Done, end of lab! ๐๐