2

Consider data created here:

set.seed(123)
data <- data.frame(time = c(1:1000, 1:1030, 1:2000), ID = rep(c('a','b','c'), c(1000,1030,2000)), variable = rnorm (4030))

We have measured some process variable over the life of 3 individuals, each of which is denoted with a unique ID. Assume that time is a function of each individuals size, so each ID has a different length. We have noticed some pattern or property within each time series that is consistent for each ID, meaning it happened in all of the time series (i.e., at some point in each ID's life), but they are at different times. Lets say this happens at time == 500 for individual a, at time == 560 for individual b, and time == 602 for individual c.

We can create a typical line plot to show each of the time series like this:

library(tidyverse)
data%>%
  ggplot(aes(time,variable, color = ID))+
  geom_point()+
  geom_line()

But how can we center each time series on these "common data points" to show how similar this "pattern" is in each ID? In other words, in the plot, ID == c will start first because it is the longest, and the data point 602 for c will be displayed at the same place as data point 500 for ID == a and data point 560 for ID == b.

Ryan
  • 1,048
  • 7
  • 14

1 Answers1

3

Can't you do:

data$event_time <- rep(c(500, 560, 602), c(1000, 1030, 2000))

data %>%
  ggplot(aes(time - event_time, variable, color = ID)) +
  geom_point() +
  geom_line() +
  geom_vline(xintercept = 0, linetype = 2) +
  labs(x = "Time since life event")

enter image description here

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • this was clever thanks! Is there a way to do this without making the extra column in `data`? Say I have a large data set with many more `ID`s that I wanted to pull these three out of. In that case, it would be convenient to do this directly in the `dplyr` pipe or `ggplot` call – Ryan Oct 27 '20 at 18:53
  • in other words, in the situation I just described, we would have to make `data2<-dplyr::filter(ID %in% c('a','b','c')` and then `data2$event_time <- rep(c(500, 560, 602), c(1000, 1030, 2000))` in order to do your process, so i was curious how it could be done without making new objects – Ryan Oct 27 '20 at 18:57