COVID-19 in Quebec

March 31, 2020 - 7 minutes
Statistics R ggplot2

Part 1, Written in March

Due to my inexperience in maintaining Blogs via Github pages, each time I post a new post, the whole site will be rendered completely, some of my functions are not longer working due to the update format of the website, but I still leave this post as a note for my studies.

As the current COVID-19 situation in Canada, I am also forced to work from home. During the weekend, I try to learn some stuff for fun. This post will illustrate the map reflecting the COVID-19 count in Quebec.

To obtain the data, we first scrape the information from the goverment’s website Quebec.ca, who provides daily updates upto the Administrative Region level. This time I will use rvest to grab the information.

library(rvest)
library(dplyr)
library(tidyr)
library(stringr)

url <- 'https://www.quebec.ca/sante/problemes-de-sante/a-z/coronavirus-2019/situation-coronavirus-quebec/'

covid <- read_html(url) %>% html_nodes("table") %>% 
                            html_table() %>% .[[1]] %>% 
                            as_tibble() %>%
                            slice(-1, -n()) %>% 
                            rename(region = 1, Case = 2) %>%
                            separate(region, c("RES_CO_REG", "Name"), " - ") %>%
                            mutate(Case = str_replace_all(Case,'\U00A0', ''))%>%
                            mutate(Case = str_replace_all(Case,' ', ''))%>%
                            # two possible way of spacing according to experience
                            mutate_at('Case', as.integer) 

Here the id of Quebec Administrative Region (QAR) I use RES_CO_REG is to accomodate to the map file later on. The shapefile of QAR can be downloaded from Énergie et Ressources naturelles at Régions administratives. The Coordinate Reference System (CRS) of this shp file is NAD83, we need to convert it to WGS84 if we have more manipulation, says calling interactive map (see below). It is worth noticing that the region of Montreal is quite small in Quebec, one way is to include a map near Montreal or use inset map.

library(sf)
qc <- read_sf("maps/qc.shp")
# convert NAD83 to WGS84
qc <- st_transform(qc, crs = 4326)

mtl <- tibble(name = 'Montreal', Long = -73.58781, Lat = 45.50884)
mtl <- st_as_sf(mtl, coords = c("Long", 'Lat'), remove = FALSE, 
                      crs = 4326, agr = "constant")

This time I will also use a relatively new package sf instead of sp to handle mapping. The sf package allow us have simple way to merge the data with map and plot the result.

library(ggplot2)
library(ggrepel)

qc <- left_join(qc %>% select(RES_NM_REG, RES_CO_REG, geometry), covid %>% select(-Name))

p <- ggplot() + geom_sf(data = qc, aes(fill = log10(Case))) + 
   scale_fill_viridis_c(labels = c(1, 10, 100, 1000), direction = -1) + theme_bw() + 
   theme(plot.title = element_text(hjust = .5),
         plot.subtitle = element_text(hjust = .5),
         plot.caption = element_text(hjust = .5)) +
   guides(fill = guide_colourbar(barwidth = 20, barheight = .8))

p1 <- p + labs(title = "COVID-19 in Quebec",
               subtitle = paste(Sys.Date()),
               caption = "Source: Quebec.ca", 
               fill = "Cases") + theme(legend.position = "none")

p2 <- p + labs(title = "COVID-19 near Montreal",
               subtitle = paste(Sys.Date()),
               caption = "Source: Quebec.ca", 
               fill = "Cases") + 
          geom_sf(data = mtl) + xlab('') +  ylab('') + 
          geom_label_repel(data = mtl, aes(x = Long, y = Lat, label = name),
                           fontface = "bold", nudge_x = -.3, nudge_y = -.2) + 
          coord_sf(xlim = c(-75, -72.5), ylim = c(45, 47), expand = FALSE)


library(ggpubr)
ggarrange(p1, p2, ncol = 2, common.legend = TRUE, legend = "bottom")

Alternatively, we can also use an interactive map, which can easily zoom-in.

### interactive
library(tmap)
# tmap_mode("plot")
tmap_mode("view")
tm_shape(qc) +  tm_polygons("Case", title = "COVID19 Case", 
                            style = "fixed",
                            legend.reverse = TRUE,
                            breaks = c(1, 10, 100, 200, 300, 500, 1500, Inf)) +
                tm_layout(aes.palette = list(seq = "-viridis"))

Similarly, I happen to notice that Santé Montréal also publish the table for Montreal, by similar steps and using the map data from the website of Ville de Montréal, we can also plot a rough situation in Montreal. Note that there are many cases, marked as Territory to be confirmed, the final may NOT be very precise but just give you a genral idea.

Note: This may not be reproducible since Santé Montréal changes their table format every time they updated the data (at least three consecutive updates on March 30th, 31st and April 2nd).

# http://donnees.ville.montreal.qc.ca/dataset/polygones-arrondissements
mtl <- read_sf("maps/mtl.shp")

url <- 'https://santemontreal.qc.ca/en/public/coronavirus-covid-19/#c36391'
cvmtl <- read_html(url) %>% html_nodes("table") %>% 
                            html_table() %>% .[[2]] %>% 
                            as_tibble() %>%
                            slice(-c(n() - 1, n())) %>% 
                            rename(NOM = 1, Case = 2) %>%
                            mutate(NOM = str_replace(NOM,'\\*','')) %>%
                            mutate(Case = str_replace_all(Case,',','')) %>%
                            mutate(Case = str_replace_all(Case,'<','')) %>%
                            mutate_at('Case', as.integer) 

cvmtl$NUM <- c(24, 9, 71, 7, 27, 72, 11, 1, 10, 3,
               17, 18, 6, 23, 74, 16, 75, 2, 5, 13,
               22, 8, 19, 25, 76, 15, 14, 77, 21, 12,
               20, 26, 4)

# daily update from sante montreal

cvmtl <- cvmtl %>% select(NUM, Case)

mtl <- mtl %>% select(NUM, NOM, TYPE) %>% left_join(cvmtl) %>% select(-NUM)

library(tmap)
# tmap_mode("plot")
tmap_mode("view")
tm_shape(mtl) +  tm_polygons("Case", title = "COVID19 Case", 
                            style = "fixed",
                            legend.reverse = TRUE,
                            breaks = c(0, 5, 10, 50, 100, 200, Inf)) +
                            tm_layout(aes.palette = list(seq = "-viridis"))

Part 2, Update in September

Since the format has changed, what I will do is to grab the information once and save it, so that the plots will work even after I have re-rendered the whole blog.

In this part, I will only consider the situation in Montreal, data in csv format will be downloadable on Santé Montréal website. Reading csv is standard procedure, but note that the name of the borough and linked cities are in French, it would be easier to read them using specific encoding.

Currently, the data record in Montreal is as follows, Number of Confirmed Cases, Distribution of Cases %, Rate per 100k perople, Number of Death, Rate of Mortality per 100k people. In this demonstration, I will only show the Rate per Million people, which is Rate per 100k people divided by 10.

The map obtaining from Ville de Montréal website will be modified and saved as RDS file using the technique I mentioned in the post with title “Saving Simple Features sf as RDS”.

library(dplyr)
library(stringr)
library(sf)
library(readr)

link <- "https://santemontreal.qc.ca/fileadmin/fichiers/Campagnes/coronavirus/situation-montreal/municipal.csv"

cvmtl <- read_delim(link, ";", 
                          escape_double = FALSE, 
                          locale = locale(decimal_mark = ",",
                                          grouping_mark = "", 
                                          encoding = "ISO-8859-1"),
                          trim_ws = TRUE)


colnames(cvmtl) <- c('name', 'confirmed', 'distribution', 'rate_pm', 'death', 'drate_pm')

cvmtl <- cvmtl %>% slice(-(n()-1), -n()) %>% 
                   mutate(name = str_replace_all(name,'\u0096', '-')) %>%
                   mutate(rate_pm = str_replace_all(rate_pm, '\\*', '')) %>%
                   mutate(drate_pm = str_replace_all(drate_pm, '\\*', '')) %>%
                   mutate(rate_pm = str_replace_all(rate_pm, ',', '.')) %>%
                   mutate(drate_pm = str_replace_all(drate_pm, ',', '.')) %>%
                   mutate(confirmed = str_replace_all(confirmed, '< ', '')) %>%
                   mutate(death = str_replace_all(death, '< ', '')) %>%
                   mutate_at(c('confirmed', 'distribution', 'rate_pm', 'death', 'drate_pm'), as.numeric) %>%
                  # n.p. as NA, ignore error message
                   mutate(rate_pm = rate_pm/10) %>% # rate per million is rate per 100k divided by 10
                   mutate(drate_pm = drate_pm/10)
                   
mtl <- readr::read_rds("mtl.rds")

cvmtl$NUM <- c(24, 9, 71, 7, 27, 72, 11, 1, 10, 3,
               17, 18, 6, 23, 74, 16, 75, 2, 5, 13,
               22, 8, 19, 25, 76, 15, 14, 77, 21, 12,
               20, 26, 4)

mtl <- mtl %>% left_join(cvmtl) %>% select(-NUM)

library(tmap)
# tmap_mode("plot")
tmap_mode("view")
tm_shape(mtl) +  tm_polygons("rate_pm", title = "COVID19 Infectious per Million", 
                             style = "fixed",
                             legend.reverse = TRUE,
                             breaks = c(0, 50, 100, 125, 150, 175, 200, Inf)) +
  tm_layout(aes.palette = list(seq = "-viridis"))

Analysis on COVID-19 Vaccine in US

May 21, 2021 - 5 minutes
Miscellaneous ggplot2

COVID-19 in Montreal

September 19, 2020 - 4 minutes
Statistics R Shiny

Linear Mixed Models via rstan

August 30, 2020 - 6 minutes
Statistics R