2

I am attempting to plot multiple time series variables on a single line chart using ggplot. I am using a data.frame which contains n time series variables, and a column of time periods. Essentially, I want to loop through the data.frame, and add exactly n goem_lines to a single chart.

Initially I tried using the following code, where;

  • df = data.frame containing n time series variables, and 1 column of time periods
  • wid = n (number of time series variables)
  p <- ggplot() +
    scale_color_manual(values=c(colours[1:wid]))  
  for (i in 1:wid) {
    p <- p + geom_line(aes(x=df$Time, y=df[,i], color=var.lab[i]))
  } 
  ggplotly(p)

However, this only produces a plot of the final time series variable in the data.frame. I then investigated further, and found that following sets of code produce completely different results:

p <- ggplot() +
    scale_color_manual(values=c(colours[1:wid]))
i = 1
p = p + geom_line(aes(x=df$Time, y=df[,i], color=var.lab[i]))
i = 2
p = p + geom_line(aes(x=df$Time, y=df[,i], color=var.lab[i]))
i = 3
p = p + geom_line(aes(x=df$Time, y=df[,i], color=var.lab[i]))
  ggplotly(p)

Plot produced by code above

p <- ggplot() +
    scale_color_manual(values=c(colours[1:wid]))
p = p + geom_line(aes(x=df$Time, y=df[,1], color=var.lab[1]))
p = p + geom_line(aes(x=df$Time, y=df[,2], color=var.lab[2]))
p = p + geom_line(aes(x=df$Time, y=df[,3], color=var.lab[3]))
  ggplotly(p)

Plot produced by code above

In my mind, these two sets of code are identical, so could anyone explain why they produce such different results?

I know this could probably be done quite easily using autoplot, but I am more interested in the behavior of these two snipits of code.

Kayla
  • 23
  • 4
  • 2
    The actual plot is build when you print (using `ggplotly`). Only then is `i` evaluated. Btw., your code violates several principles behind ggplot2. You should approach this in a completely different way (i.e., you should first melt the data.frame). – Roland Oct 06 '20 at 12:35
  • Hi there Roland. Thanks, that explains it! Also thanks for the advise on the principle violations, this explains why I am getting strange results in some cases. I will do some further reading on ggplot2 to avoid these in the future. – Kayla Oct 06 '20 at 12:39
  • 1
    You should reshape your data so you don't need to do multiple `geom_line` calls but a single call. Basically, look at `pivot_longer` so you can have a single variable that can be mapped to `color` – csgroen Oct 06 '20 at 13:00
  • Thank you for the advise csgroen: pivot_longer + some much needed additional reading on ggplot is exactly what i needed. – Kayla Oct 06 '20 at 13:31

1 Answers1

1

What you're trying to do is a 'hack' way by plotting multiple lines, but it's not ideal in ggplot terms. To do it successfully, I'd use aes_string. But it's a hack.

df <- data.frame(Time = 1:20,
                 Var1 = rnorm(20),
                 Var2 = rnorm(20, mean = 0.5),
                 Var3 = rnorm(20, mean = 0.8))

vars <- paste0("Var", 1:3)
col_vec <- RColorBrewer::brewer.pal(3, "Accent")

library(ggplot2)
p <- ggplot(df, aes(Time))
for (i in 1:length(vars)) {
    p <- p + geom_line(aes_string(y = vars[i]), color = col_vec[i], lwd = 1)
}
p + labs(y = "value")

enter image description here

How to do it properly

To make this plot more properly, you need to pivot the data first, so that each aesthetic (aes) is mapped to a variable in your data frame. That means we need a single variable to be color in our data frame. Hence, we pivot_longer and plot again:

library(tidyr)
df_melt <- pivot_longer(df, cols = Var1:Var3, names_to = "var")

ggplot(df_melt, aes(Time, value, color = var)) +
    geom_line(lwd = 1) +
    scale_color_manual(values = col_vec)

enter image description here

csgroen
  • 2,511
  • 11
  • 28
  • 1
    Thank you - this is precisely what I was trying to accomplish. I've been using R for a while, but this is the first time I am really playing around with the visual side of things, so thank you so much for pointing me in the right direction. – Kayla Oct 06 '20 at 13:33
  • I feel you. `ggplot` has quite a distinct syntax and 'logic' and many people who are extremely proficient in base R have to 're-think' how things are done to fit its logic. In the beginning, it seems counter-intuitive, but I think it becomes quite elegant when you put in the effort to figure it out – csgroen Oct 06 '20 at 13:57