1

I am trying to plot 35 individual time series data (102 data points each) using ggplot and geom_line. I'd also like to overlap the grand mean of the individual data across time as a second geom_line that is either a different color or different alpha.

Here is a sample from my data:

> dput(head(mdata, 10))
structure(list(Individual = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L), Signal = c(-0.132894911, -0.13, 0, 0, 0, 0.02, 0.01, 
0.01, 0, 0.02), Time = c(0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 
0.8, 0.9)), row.names = c(NA, 10L), class = "data.frame")

I've done this before with summarySE, however, it is no longer compatible the current version of R. I've tried to use two separate data frames (one with the individual data and one with the mean data) and overlay those data but I think because I've melted the individual data (from 35x102 data frame to a 3x3570), I am getting an error that says:

"Aesthetics must be either length 1 or the same as the data (102): group".

Then, I've tried using stat_summary and fun.data but I am still getting errors that says:

Error: geom_line requires the following missing aesthetics: y

ggplot(data=mdata,aes(x=Time, y=Signal, group=Individual, ymin=-1, ymax=3))+ 
  geom_line()+
  stat_summary(fun.data="mean", geom="line", color = "red")

Here is a dropbox link to the example data frame and graph I need as an output.

Any advice would be greatly appreciated! I've seen similar problems elsewhere, but I think the fact I am grouping my data within the aesthetic is causing me problems.

deepseefan
  • 3,701
  • 3
  • 18
  • 31
Kristin
  • 13
  • 5

3 Answers3

1

You can add a layer geom_line() from the summary data frame.

# Let's create the summary using `dplyr'
library(dplyr)
avg_group <- mdata %>% 
  select(Individual, Signal, Time) %>%
  group_by(Individual) %>% 
  summarise(avg_ind = mean(Time), avg_sig = mean(Signal))
# -------------------------------------------------------------------------
# > avg_group
# # A tibble: 35 x 3
# Individual avg_ind avg_sig
# <int>   <dbl>   <dbl>
# 1          1    5.05  0.107 
# 2          2    5.05  0.0947
# 3          3    5.05  0.0781
# 4          4    5.05  0.0362
# 5          5    5.05  0.0156
# 6          6    5.05  0.0182
# 7          7    5.05  0.774 
# 8          8    5.05  0.297 
# 9          9    5.05  0.517 
# 10         10    5.05  0.685 
# # … with 25 more rows
# -------------------------------------------------------------------------
# Then plot the graph using 
ggplot(mdata,aes(x=Time, y=Signal, group=Individual, ymin=-1, ymax=3))+ 
  geom_line() + 
  geom_line(data = avg_group, aes(avg_ind, avg_sig), group = 1, color = "red") + theme_bw()
# -------------------------------------------------------------------------

Output

avg_time_signal

If you prefer stat_summary() what you can do is to add an explicit variable common to the dataframe and use that as a grouping aesthetic. You can do that as follows:

# > head(mdata, 2)
# Individual     Signal Time
# 1          1 -0.1328949  0.0
# 2          1 -0.1300000  0.1
# ------------------------------------------------------------------------
mdata$grand <- 1 

# > head(mdata, 2)
# Individual     Signal Time grand
# 1          1 -0.1328949  0.0     1
# 2          1 -0.1300000  0.1     1
# ------------------------------------------------------------------------
# plot using grand as an explicit variable used to group the plot
ggplot(mdata,aes(x=Time, y=Signal, group=Individual, ymin=-1, ymax=3))+ 
  geom_line() + stat_summary(aes(group = grand), fun.y="mean", geom="line", color = "red") + theme_bw()

Output

output_stat_summary

To make something like the output you expect (as shown in the link you shared),

ggplot(data=mdata,aes(x=Time, y=Signal, group=Individual, ymin=-1, ymax=3))+ 
  geom_line()+ 
  geom_rect(xmin = (mean(mdata$Time) + se(mdata$Time)) , xmax =xmin + 0.4, fill = "red", ymax = -0.94, ymin = -1) + theme_bw()

There is a warning to this output as all is not coming from the data, though the grand mean and standard error are used to plot the rectangle.

Output

output_geom_rec

You may refer here for the se function.

Community
  • 1
  • 1
deepseefan
  • 3,701
  • 3
  • 18
  • 31
  • Thank you! The second output using the stat_summary is precisely what I was looking for. I didn't consider adding an secondary variable. – Kristin Aug 24 '19 at 15:13
0

Have you tried something like this? Just generalize.

df2<-co2+10

ts1<-ts(co2)
ts2<-ts(df2)
ts3<-ts((ts1+ts2)/2) # In your case the mean can be calculated with a more dedicated function

require(ggplot2)

ggplot()+geom_line(aes(x=1:length(ts1),y=ts1,group=1))+geom_line(aes(x=1:length(ts2),y=ts2,group=2))+
  geom_line(aes(x=1:length(ts3),y=ts3,group=3,color="red"))+labs(color="Grandmean",x="Time",y="Serie")

Result

0

This is not as elegant as stat_summary, but you could get the grand mean via:

by_time <- group_by(df, Time)
s <- summarise(by_time, meanSignal = mean(Signal, na.rm=T))
s
# A tibble: 102 x 2
    Time meanSignal
   <dbl>      <dbl>
 1   0    -1.16e- 1
 2   0.1  -1.15e- 1
 3   0.2  -9.14e- 3
 4   0.3   4.57e- 3

Then plot using the two data frames, df, and s.

ggplot(df, aes(x= Time, y = Signal))+geom_line(alpha = 0.25,aes(group=Individual))+geom_line(data=s, aes(x = Time, y = meanSignal), color="#FF0000")

Which gives you:

multi line graph with grand mean in red

indubitably
  • 297
  • 2
  • 7