1

I have a data frame that simulates the NFL season with 2 columns: team and rank. I am trying to use ggridges to make a distribution plot of the frequency of each team at each rank from 1-10. I can get the plot working, but I'd like to display the count of each team/rank in each bin. I have been unsuccessful so far.

   ggplot(results, 
       aes(x=rank, y=team, group = team)) +
   geom_density_ridges2(aes(fill=team), stat='binline', binwidth=1, scale = 0.9, draw_baseline=T) +
   scale_x_continuous(limits = c(0,11), breaks = seq(1,10,1)) +
   theme_ridges() +
   theme(legend.position = "none") +
   scale_fill_manual(values = c("#4F2E84", "#FB4F14",  "#7C1415", "#A71930", "#00143F", "#0C264C", "#192E6C", "#136677", "#203731"), name = NULL)

Which creates this plot:

enter image description here

I tried adding in this line to get the count added to each bin, but it did not work.

   geom_text(stat='bin', aes(y = team + 0.95*stat(count/max(count)),
                         label = ifelse(stat(count) > 0, stat(count), ""))) +

Not the exact dataset but this should be enough to at least run the original plot:

   results = data.frame(team = rep(c('Jets', 'Giants', 'Washington', 'Falcons', 'Bengals', 'Jaguars', 'Texans', 'Cowboys', 'Vikings'), 1000), rank = sample(1:20,9000,replace = T))
D. Bryant
  • 105
  • 2
  • 8

2 Answers2

5

How about calculating the count for each bin, joining to the original data and using the new variable n as the label?

library(dplyr) # for count, left_join

results %>% 
  count(team, rank) %>% 
  left_join(results) %>% 
  ggplot(aes(rank, team, group = team)) +
  geom_density_ridges2(aes(fill = team), 
                       stat = 'binline', 
                       binwidth = 1, 
                       scale = 0.9, 
                       draw_baseline = TRUE) +
  scale_x_continuous(limits = c(0, 11), 
                     breaks = seq(1, 10, 1)) +
  theme_ridges() +
  theme(legend.position = "none") +
  scale_fill_manual(values = c("#4F2E84", "#FB4F14",  "#7C1415", "#A71930", "#00143F",
                               "#0C264C", "#192E6C", "#136677", "#203731"), name = NULL) +
  geom_text(aes(label = n), 
            color = "white", 
            nudge_y = 0.2)

Result:

enter image description here

neilfws
  • 32,751
  • 5
  • 50
  • 63
1

Neilfws' answer is great, but I've always found geom_ridgelines difficult to work with in circumstances like this so I usually recreate them with geom_rect:

library(dplyr)

results %>%
  count(team, rank) %>%
  filter(rank<=10) %>%
  mutate(team=factor(team)) %>%
  ggplot() +
  geom_rect(aes(xmin=rank-0.5, xmax=rank+0.5, ymin=team, fill=team,
                ymax=as.numeric(team)+n*0.75/max(n))) +
  geom_text(aes(x=rank, y=as.numeric(team)-0.1, label=n)) +
  theme_ridges() +
  theme(legend.position = "none") +
  scale_fill_manual(values = c("#4F2E84", "#FB4F14",  "#7C1415", "#A71930", 
                               "#00143F", "#0C264C", "#192E6C", "#136677", 
                               "#203731"), name = NULL) +
  ylab("team")

As requested

I especially like the level of fine control I get from geom_rect rather than ridgelines. But you do lose out on the nice bounding line drawn around each ridgeline, so if that's important then go with the other answer.

Dubukay
  • 1,764
  • 1
  • 8
  • 13
  • @Neilfws answer is really good, but your labeling is a lot easier to read when using the actual distribution. Namely, some of the bars that are much smaller than others don't fully display the text overlaid inside the bars. Your answer putting the labels under the axis makes it much more legible. Both answers solved the problem though. Thank you very much! – D. Bryant Oct 27 '20 at 12:54
  • @D.Bryant True, my answer illustrates just the basics of `geom_text`, but you could adjust both the color and position of the labels. – neilfws Oct 27 '20 at 21:42