Visualizing Longitudinal Categorical Data

Question

I am trying to visualize longitudinal changes in a categorical variable. A small chunk of my data is shown below. The data is telling us how long each participant is on a particular insurance type, with the maximum amount of follow-up being 45 weeks. For example, participant 1 was on insurance type 1 for 40 weeks. Participant 2 was on insurance type 2 until 24 weeks, then insurance type 1 until week 25, then back to insurance type 1 until week 35.

df <- data_frame(id = c(1, 2, 2, 2, 3, 4, 4, 5, 6, 6, 6, 7, 8, 8, 9, 10, 11, 12, 13, 13), weeks = c(40, 24, 25, 35, 41, 9, 40, 41, 14, 18, 39, 37, 0, 40, 39, 39, 40, 41, 0, 41), insurance = as.factor(c(1, 2, 1, 2, 1, 1, 3, 3, 1, 2, 1, 1, 4, 1, 1, 1, 1, 3, 3, 1)))

I am trying to figure out a way to visualize this data. I have looked at other posts (1, 2, 3) and am still having trouble.

This is what I've been able to do so far, using geom_tile in the ggplot2 package:

I'm wondering if someone can help me figure out how to fill the color to the left? For example, ID 1 should have an entire row of pink, ID 2 should have green until week 24, then pink until week 25, then green again.

If you have any other suggestions as well, that would be great! My actual data has over 71,000 participants, so if you have any other options for how to visualize this data well, that would be greatly appreciated.

Thank you for the help!!

Initial code for graph: ggplot(df, aes(x=weeks, y=id, fill=insurance)) + geom_tile(color="grey20")

UPDATE: While TarJae's answer is greatly helpful, I would still really like any assistance in making my initial plot in the way I had hoped. I would love to see each participant have a row in a graph, where we can easily see where the changes occurred. This is similar to what can be done in the longCatEDA package, but transforming my data to wide format has proven very challenging so I'd like to keep it in long format and find a workaround. — epifan, Mar 14 '22 at 21:18
Alternatively, any help in transforming it to the long format needed by the longCatEDA package would also be appreciated :) — epifan, Mar 14 '22 at 21:42

TarJae · Answer 1 · 2022-03-14T19:49:18.680

Update: after clarification see comments:

library(tidyverse)

df %>% 
  ggplot(aes(x=factor(id), y=weeks), color=insurance) +
  geom_segment( aes(x=id, xend=id, y=0, yend=weeks)) +
  geom_point( aes(color=insurance), size=4, alpha=0.6) +
  scale_x_discrete()+
  theme_light() +
  coord_flip() +
  xlab("ID")+
  scale_color_brewer(palette="Dark2")

First answer: Maybe with a kind of lollipop chart:

library(tidyverse)

df %>% 
  group_by(id) %>% 
  mutate(row = row_number()) %>% 
  mutate(id_new = paste0(id, "-", row)) %>% 
  ungroup() %>% 
  mutate(order = row_number()) %>% 
  mutate(id_new = factor(id_new, levels = id_new)) %>% 
  ggplot(aes(x=fct_reorder(id_new, order), y=weeks), color=insurance) +
  geom_segment( aes(x=id_new, xend=id_new, y=0, yend=weeks, color=insurance)) +
  geom_point( aes(color=insurance), size=4, alpha=0.6) +
  theme_light() +
  coord_flip() +
  xlab("ID")+
  scale_color_brewer(palette="Dark2")

Oh this is great! Is there a way for each ID to have its own line and to show each change as a point? Rather than having the IDs separated by number of observations? — epifan, Mar 14 '22 at 19:41

score 0 · Answer 2 · answered Mar 14 '22 at 18:22

Maybe you want something like this:

Your data:

df <- data_frame(id = c(1, 2, 2, 2, 3, 4, 4, 5, 6, 6, 6, 7, 8, 8, 9, 10, 11, 12, 13, 13), weeks = c(40, 24, 25, 35, 41, 9, 40, 41, 14, 18, 39, 37, 0, 40, 39, 39, 40, 41, 0, 41), insurance = as.factor(c(1, 2, 1, 2, 1, 1, 3, 3, 1, 2, 1, 1, 4, 1, 1, 1, 1, 3, 3, 1)))

You can use this code:

library(ggplot2)
    ggplot(df, aes(x=weeks, y=id, color=factor(insurance))) +
      geom_point(size=2) +
      scale_color_discrete("Insurance",labels=c("1","2", "3", "4")) +
      facet_grid(.~insurance) +
      theme_bw()

Output plot:

Hmm, this is helpful but not quite what I'm looking for. With over 71,000 participants, I think this will get pretty complicated. What I really want is a visualization of when these changes happen during follow-up. — epifan, Mar 14 '22 at 18:43

Visualizing Longitudinal Categorical Data

2 Answers2