2

I am trying to visualize longitudinal changes in a categorical variable. A small chunk of my data is shown below. The data is telling us how long each participant is on a particular insurance type, with the maximum amount of follow-up being 45 weeks. For example, participant 1 was on insurance type 1 for 40 weeks. Participant 2 was on insurance type 2 until 24 weeks, then insurance type 1 until week 25, then back to insurance type 1 until week 35.

df <- data_frame(id = c(1, 2, 2, 2, 3, 4, 4, 5, 6, 6, 6, 7, 8, 8, 9, 10, 11, 12, 13, 13), weeks = c(40, 24, 25, 35, 41, 9, 40, 41, 14, 18, 39, 37, 0, 40, 39, 39, 40, 41, 0, 41), insurance = as.factor(c(1, 2, 1, 2, 1, 1, 3, 3, 1, 2, 1, 1, 4, 1, 1, 1, 1, 3, 3, 1)))

I am trying to figure out a way to visualize this data. I have looked at other posts (1, 2, 3) and am still having trouble.

This is what I've been able to do so far, using geom_tile in the ggplot2 package: graph

I'm wondering if someone can help me figure out how to fill the color to the left? For example, ID 1 should have an entire row of pink, ID 2 should have green until week 24, then pink until week 25, then green again.

If you have any other suggestions as well, that would be great! My actual data has over 71,000 participants, so if you have any other options for how to visualize this data well, that would be greatly appreciated.

Thank you for the help!!

Initial code for graph: ggplot(df, aes(x=weeks, y=id, fill=insurance)) + geom_tile(color="grey20")

epifan
  • 55
  • 4
  • UPDATE: While TarJae's answer is greatly helpful, I would still really like any assistance in making my initial plot in the way I had hoped. I would love to see each participant have a row in a graph, where we can easily see where the changes occurred. This is similar to what can be done in the longCatEDA package, but transforming my data to wide format has proven very challenging so I'd like to keep it in long format and find a workaround. – epifan Mar 14 '22 at 21:18
  • Alternatively, any help in transforming it to the long format needed by the longCatEDA package would also be appreciated :) – epifan Mar 14 '22 at 21:42

2 Answers2

1

Update: after clarification see comments:

library(tidyverse)

df %>% 
  ggplot(aes(x=factor(id), y=weeks), color=insurance) +
  geom_segment( aes(x=id, xend=id, y=0, yend=weeks)) +
  geom_point( aes(color=insurance), size=4, alpha=0.6) +
  scale_x_discrete()+
  theme_light() +
  coord_flip() +
  xlab("ID")+
  scale_color_brewer(palette="Dark2")

enter image description here

First answer: Maybe with a kind of lollipop chart:

library(tidyverse)

df %>% 
  group_by(id) %>% 
  mutate(row = row_number()) %>% 
  mutate(id_new = paste0(id, "-", row)) %>% 
  ungroup() %>% 
  mutate(order = row_number()) %>% 
  mutate(id_new = factor(id_new, levels = id_new)) %>% 
  ggplot(aes(x=fct_reorder(id_new, order), y=weeks), color=insurance) +
  geom_segment( aes(x=id_new, xend=id_new, y=0, yend=weeks, color=insurance)) +
  geom_point( aes(color=insurance), size=4, alpha=0.6) +
  theme_light() +
  coord_flip() +
  xlab("ID")+
  scale_color_brewer(palette="Dark2")

enter image description here

TarJae
  • 72,363
  • 6
  • 19
  • 66
  • 1
    Oh this is great! Is there a way for each ID to have its own line and to show each change as a point? Rather than having the IDs separated by number of observations? – epifan Mar 14 '22 at 19:41
0

Maybe you want something like this:

Your data:

df <- data_frame(id = c(1, 2, 2, 2, 3, 4, 4, 5, 6, 6, 6, 7, 8, 8, 9, 10, 11, 12, 13, 13), weeks = c(40, 24, 25, 35, 41, 9, 40, 41, 14, 18, 39, 37, 0, 40, 39, 39, 40, 41, 0, 41), insurance = as.factor(c(1, 2, 1, 2, 1, 1, 3, 3, 1, 2, 1, 1, 4, 1, 1, 1, 1, 3, 3, 1)))

You can use this code:

library(ggplot2)
    ggplot(df, aes(x=weeks, y=id, color=factor(insurance))) +
      geom_point(size=2) +
      scale_color_discrete("Insurance",labels=c("1","2", "3", "4")) +
      facet_grid(.~insurance) +
      theme_bw()

Output plot:

enter image description here

Quinten
  • 35,235
  • 5
  • 20
  • 53
  • Hmm, this is helpful but not quite what I'm looking for. With over 71,000 participants, I think this will get pretty complicated. What I really want is a visualization of when these changes happen during follow-up. – epifan Mar 14 '22 at 18:43