-1

To preface, I'm relatively new to R and trying to replicate a previous user's inquiry found here: Stackoverflow question.

Instead of the data being in days, and looking only at more recent set of dates, I want to include all points of time, in quarters, for the data. I keep encountering issues as I'm not well equipped to understand each point of their code in order to go about this in a reasonable way.

Edit: I've tried to follow the example, however, I still can't replicate the example provided. I'm trying to get the labels to the right, and change the x-axis to display Q1 2015, Q2 2015, etc. I've placed my attempt at the code below:

library(readxl)
library(ggrepel)
library(tidyverse)
library(ggplot2)

owid <- read_xlsx("/Desktop/testR.xlsx") %>%
  filter(date >= "2014-01-01" & date <= "2020-07-01") %>% 
  select(location, date, outcome) %>%
  arrange(location, date) %>%
  group_by(location) %>%
  complete(date = seq.Date(as.Date("2014-01-01"), 
                           as.Date("2020-07-01"), 
                           by="quarter")) %>%
  fill(outcome) %>%
  ungroup() %>%
  mutate(location = factor(location),
         location = fct_reorder2(location, outcome,
                                 outcome)) %>%
  mutate(datenew= as.Date(date, format= "%d.%m.%Y")) %>%
  mutate(label = if_else(datenew == max(datenew), 
                         as.character(location), 
                         NA_character_)) %>%
  mutate(yq = as.yearqtr(datenew)) 


 G01 <-
    owid %>%
      ggplot(aes(x=datenew, y=outcome, group=location,
                 color=location)) +
      geom_point() + 
      geom_line() +
      theme_minimal() + 
      labs(y="",
           x="") +
      theme(panel.grid.major.x = element_blank(),
            panel.grid.major.y = element_line(linetype = "dashed"),
            panel.grid.minor.y = element_blank(),
            panel.grid.minor.x = element_blank(),
            plot.title.position = "plot",
            plot.title = element_text(face="bold"),
            legend.position = "none") +
      scale_y_continuous(breaks=c(seq(0, 70, 10))) +
      scale_x_date(breaks = as.Date(c("2015-01-01", 
                                      "2015-04-01",
                                      "2015-07-01",
                                      "2015-10-01",
                                      "2016-01-01", 
                                      "2016-04-01",
                                      "2016-07-01",
                                      "2016-10-01",
                                      "2017-01-01", 
                                      "2017-04-01",
                                      "2017-07-01",
                                      "2017-10-01",
                                      "2018-01-01", 
                                      "2018-04-01",
                                      "2018-07-01",
                                      "2018-10-01",
                                      "2019-01-01", 
                                      "2019-04-01",
                                      "2019-07-01",
                                      "2019-10-01",
                                      "2020-01-01",
                                      "2020-04-01",
                                      "2020-07-01")),
                   labels = scales::date_format("%Y-%m"),
                   limits = as.Date(c("2015-01-01",
                                      "2020-07-01")))
    
    G01 +
      geom_text_repel(aes(label = gsub("^.*$", " ", label)), # This will force the correct position of the link's right end.
                      segment.curvature = -0.1,
                      segment.square = TRUE,
                      segment.color = 'grey',
                      box.padding = 0.1,
                      point.padding = 0.6,
                      nudge_x = 0.15,
                      nudge_y = 1,
                      force = 0.5,
                      hjust = 0,
                      direction="y",
                      na.rm = TRUE, 
                      xlim = as.Date(c("2015-01-01", "2020-07-01")),
                      ylim = c(0,70),
      ) +
    
      geom_text_repel(data = . %>% filter(!is.na(label)),
                      aes(label = paste0("  ", label)),
                      segment.alpha = 0, ## This will 'hide' the link
                      segment.curvature = -0.1,
                      segment.square = TRUE,
                      # segment.color = 'grey',
                      box.padding = 0.1,
                      point.padding = 0.6,
                      nudge_x = 0.15,
                      nudge_y = 1,
                      force = 0.5,
                      hjust = 0,
                      direction="y",
                      na.rm = TRUE, 
                      xlim = as.Date(c("2015-01-01", "2020-07-01")),
                      ylim = c(0,70))

My results look like such here

a_swoosh
  • 23
  • 7
  • SO isn't a coding service. If you want to make this kind of plot I would suggest to start with the referenced post and adjust it to your needs and data. As long as the question simply asks for "Please write code for me." I'm afraid I have to vote to close it. – stefan Mar 28 '22 at 07:36
  • Thanks stefan. I've modified my post accordingly. – a_swoosh Mar 28 '22 at 18:49

1 Answers1

1

I'm new to SO, but I'll give it a go. If you add the year quarter information as a column and plot that as your x variable (without calculating per quarter means), you will end up with many points per quarter and a plot that is hard to read.

Try running this to see what I mean:

library(tidyverse)
library(ggrepel)
library(zoo)

keep <- c("Israel", "United Arab Emirates", "United Kingdom",
          "United States", "Chile", "European Union", "China",
          "Russia", "Brazil", "World", "Mexico", "Indonesia",
          "Bangladesh")

owid <- read_csv("https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/vaccinations/vaccinations.csv") %>%
  filter(location %in% keep) %>%
  filter(date >= "2021-01-01" & date <= "2022-02-12") %>% ## edit
  select(location, date, total_vaccinations_per_hundred) %>%
  arrange(location, date) %>%
  group_by(location) %>%
  complete(date = seq.Date(as.Date("2021-01-01"), 
                           as.Date("2022-02-12"), ## edit
                           by="day")) %>%
  fill(total_vaccinations_per_hundred) %>%
  ungroup() %>%
  mutate(location = factor(location),
         location = fct_reorder2(location, total_vaccinations_per_hundred,
                                 total_vaccinations_per_hundred)) %>%
  mutate(label = if_else(date == max(date), 
                         as.character(location), 
                         NA_character_)) %>%
  mutate(yq = as.yearqtr(date)) ## add year quarter column


owid %>%
  ggplot(aes(x=yq, y=total_vaccinations_per_hundred, group=location,
             color=location)) +
  geom_point() + 
  geom_line() +
  theme_minimal() + 
  labs(title = "Cumulative COVID-19 vaccination doses administered per   100 people",
       subtitle = "This is counted as a single dose, and may not equal the total number of people vaccinated, depending on the specific dose regime (e.g. people receive multiple doses).",
       caption = "Source: Official data collected by Our World in Data — Last updated 13 February, 11:40 (London time)",
       y="",
       x="") +
  theme(panel.grid.major.x = element_blank(),
        panel.grid.major.y = element_line(linetype = "dashed"),
        panel.grid.minor.y = element_blank(),
        panel.grid.minor.x = element_blank(),
        plot.title.position = "plot",
        plot.title = element_text(face="bold"),
        legend.position = "none") +
  geom_label_repel(aes(label = label),
                   nudge_x = 1,
                   hjust = "left", direction="y",
                   na.rm = TRUE) +
  scale_x_yearqtr(limits = c(min(owid$yq), max(owid$yq)), 
                  format = "%YQ%q") ## scale axis by year quarter

Instead, what I think what you may want to do is leave the data frame as is and manually set the breaks in the plot. This is similar to what they did in the example but you will be setting the date breaks as quarters (e.g., first of Jan, April, etc). Doing things manually is painful but, in my experience, dealing with dates in R is never completely painless.

Using your example again (just add this to the bottom of the previous code to run):

owid %>%
  ggplot(aes(x=date, y=total_vaccinations_per_hundred, group=location,
             color=location)) +
  geom_point() + 
  geom_line() +
  theme_minimal() + 
  labs(title = "Cumulative COVID-19 vaccination doses administered per 100 people",
       subtitle = "This is counted as a single dose, and may not equal the total number of people vaccinated, depending on the specific dose regime (e.g. people receive multiple doses).",
       caption = "Source: Official data collected by Our World in Data — Last updated 13 February, 11:40 (London time)",
       y="",
       x="") +
  theme(panel.grid.major.x = element_blank(),
        panel.grid.major.y = element_line(linetype = "dashed"),
        panel.grid.minor.y = element_blank(),
        panel.grid.minor.x = element_blank(),
        plot.title.position = "plot",
        plot.title = element_text(face="bold"),
        legend.position = "none") +
  geom_label_repel(aes(label = label),
                   nudge_x = 1,
                   hjust = "left", direction="y",
                   na.rm = TRUE) +
  scale_x_date(breaks = as.Date(c("2021-01-01", ## set manually
                                  "2021-04-01",
                                  "2021-07-01",
                                  "2021-10-01",
                                  "2022-01-01",
                                  "2022-04-01")),
               labels = scales::date_format("%b %d"),
               limits = as.Date(c("2021-01-01",
                                  "2022-04-01")))

If you do actually want one data point per quarter, then you could add the year-quarter column to your data frame (done in the first block of code here) and summarise the data before plotting, similar to this.

One last time:

owid %>% 
  group_by(location, yq) %>% ## group by location and year quarter
  summarise(., total_vaccinations_per_hundred = mean(total_vaccinations_per_hundred)) %>% ## summarise vaccinations
  ggplot(aes(x=yq, y=total_vaccinations_per_hundred, group=location,
             color=location)) +
  geom_point() + 
  geom_line() +
  theme_minimal() + 
  labs(title = "Cumulative COVID-19 vaccination doses administered per 100 people",
       subtitle = "This is counted as a single dose, and may not equal the total number of people vaccinated, depending on the specific dose regime (e.g. people receive multiple doses).",
       caption = "Source: Official data collected by Our World in Data — Last updated 13 February, 11:40 (London time)",
       y="",
       x="") +
  theme(panel.grid.major.x = element_blank(),
        panel.grid.major.y = element_line(linetype = "dashed"),
        panel.grid.minor.y = element_blank(),
        panel.grid.minor.x = element_blank(),
        plot.title.position = "plot",
        plot.title = element_text(face="bold"),
        legend.position = "none")
Dharman
  • 30,962
  • 25
  • 85
  • 135
atomb
  • 11
  • 2