1

Is there any possibility to add labels to the alluvia in ggalluvial?

Example plot:

library(ggalluvial)
data(vaccinations)
levels(vaccinations$response) <- rev(levels(vaccinations$response))
ggplot(vaccinations,
       aes(x = survey, stratum = response, alluvium = subject,
           y = freq,
           fill = response, label = response)) +
  scale_x_discrete(expand = c(.1, .1)) +
  geom_flow() +
  geom_stratum(alpha = .5) +
  geom_text(stat = "stratum", size = 3) +
  theme(legend.position = "none") +
  ggtitle("vaccination survey responses at three points in time")

Now I would like to add the subject id/nrs as labels to the alluvia (not to the boxes). Is there any possibility to do so? My original Data has much less subjects per alluvia (e.g. 2-5).

enter image description here

captcoma
  • 1,768
  • 13
  • 29

1 Answers1

5

I'd love to know a better way to accomplish this! Here's an illustration of the least bad trick i've come across. The key changes from your code are as follows:

  • The aesthetic mapping label = response is moved from the ggplot() call to the (first) geom_text() layer, so that it doesn't interfere with other text layers.
  • The additional geom_text() layer is paired with the flow stat in order to label each "flow" graphical element. Note that there are more such elements than are distinguishable visually.
  • The new text layer uses the new aesthetic mapping label = as.character(subject); the subject (actually cohort) IDs are numerical, so omitting the coercion to character would result in them being meaninglessly added together. (Maybe this behavior should be controllable by a new parameter?)
  • The new text layer uses the aesthetic mapping color = survey == "ms153_NSA" in order to separate the first time point from the latter two, and the manual color scale makes the one invisible (note: NA would be more appropriate than "00000000") and the other black. Though see the note below. (Setting data = subset(vaccinations, survey != "ms_153_NSA") would appear to have the same effect, but actually it can confuse the labels, since the flow stat partitions the flows differently when the first time point is removed.)
  • The labels are offset from the strata onto the flows by nudge_x = -.25.

Note: Something like this can produce the desired behavior with the alluvium stat, but the flow stat results in overlaid labels at the middle time point (or any axis other than the first and last), which don't always agree since the flows are arranged differently on the left and right sides. There are some toy examples of both in the vignette about stratum and lode ordering. I don't think there's any way around this currently.

library(ggalluvial)
#> Loading required package: ggplot2
data(vaccinations)
levels(vaccinations$response) <- rev(levels(vaccinations$response))
ggplot(vaccinations,
       aes(x = survey, stratum = response, alluvium = subject,
           y = freq,
           fill = response)) +
  scale_x_discrete(expand = c(.1, .1)) +
  geom_flow() +
  geom_stratum(alpha = .5) +
  geom_text(aes(label = response),
            stat = "stratum", size = 3) +
  geom_text(aes(label = as.character(subject), color = survey == "ms153_NSA"),
            stat = "flow", size = 3, nudge_x = -.25) +
  scale_colour_manual(values = c("#000000ff", "#00000000")) +
  theme(legend.position = "none") +
  ggtitle("vaccination survey responses at three points in time")

Created on 2020-02-19 by the reprex package (v0.3.0)

Cory Brunson
  • 668
  • 4
  • 10