2

Ggplot2 alluvial

I realized this graph using ggplot2 and I'd like to change y axes to percentages, from 0% to 100% with breaks every 10. I know I can use:

+ scale_y_continuous(label=percent, breaks = seq(0,1,.1))

but I still get a problem because, turning into percentages, R interpret 30000 as 30000%, so if a limit to 100% I don't get anything in my graph. How can I manage it?

I have a dataset like this:

ID time value
1   1   B with G available
2   1   Generic
3   1   B with G available
4   1   Generic
5   1   B with G available
6   1   Generic
7   1   Generic
8   1   Generic
9   1   B with G available
10  1   B with G available
11  1   Generic
12  1   B with G available
13  1   B with G available
14  1   B with G available
15  1   Generic
16  1   B with G available
17  1   B with G available
18  1   B with G available
19  1   B with G available
20  1   B with G available
1   2   B with G available
2   2   Generic
3   2   B with G available
4   2   Generic
5   2   B with G available
6   2   Generic
7   2   Generic
8   2   Generic
9   2   B with G available
10  2   B with G available
11  2   Generic
12  2   B with G available
13  2   B with G available
14  2   B with G available
15  2   Generic
16  2   B with G available
17  2   switch
18  2   B with G available
19  2   B with G available
20  2   switch

which is reproducible with this code:

PIPPO <- data.frame("ID"=rep(c(1:20),2), "time"=c(rep(1,20),rep(2,20)), "value"=c("B","G","B","G","B",rep("G",3),rep("B",2),"G",rep("B",3),"G",rep("B",6),"G","B","G","B",rep("G",3),rep("B",2),"G",rep("B",3),"G","B","switch",rep("B",2),"switch"))

so I don't have a variable for y axes I can manage.

Here my code and the plot I obtained

ggplot(PIPPO, 
       aes(x = time, stratum = value, alluvium = ID,
           fill = value, label = value)) +
  scale_fill_brewer(type = "qual" , palette = "Set3") +
  geom_flow(stat = "flow", knot.pos = 1/4, aes.flow = "forward",
            color = "gray") + 
  geom_stratum() +
  theme(legend.position = "bottom") 

enter image description here

Could anyone help me?

What I get on real data using

scale_y_continuous(label = scales::percent_format(scale = 100 / n_id))

is this: enter image description here

with 84% as the maximum value (and not 100%). How can i get the y-axes up to 100% and broken every 10% ?

Here what I get with

scale_y_continuous(breaks = scales::pretty_breaks(10), label = scales::percent_format(scale = 100 / n_id))

enter image description here

I get this weird values every 14%.

stefan
  • 90,330
  • 6
  • 25
  • 51
jeff
  • 323
  • 1
  • 7

3 Answers3

2

Using the scale argument in percent_format this can be achieved like so:

PIPPO <- data.frame("ID"=rep(c(1:20),2), "time"=c(rep(1,20),rep(2,20)), "value"=c("B","G","B","G","B",rep("G",3),rep("B",2),"G",rep("B",3),"G",rep("B",6),"G","B","G","B",rep("G",3),rep("B",2),"G",rep("B",3),"G","B","switch",rep("B",2),"switch"))

library(ggplot2)
library(ggalluvial)

n_id <- length(unique(PIPPO$ID))

ggplot(PIPPO, 
       aes(x = time, stratum = value, alluvium = ID,
           fill = value, label = value)) +
  scale_fill_brewer(type = "qual" , palette = "Set3") +
  scale_y_continuous(label = scales::percent_format(scale = 100 / n_id)) +
  geom_flow(stat = "flow", knot.pos = 1/4, aes.flow = "forward", color = "gray",) + 
  geom_stratum() +
  theme(legend.position = "bottom") 

Created on 2020-05-19 by the reprex package (v0.3.0)

stefan
  • 90,330
  • 6
  • 25
  • 51
  • Thanks. What I'd like to obtain is the y-axes broken every 10% till 100%. How could I do it? – jeff May 21 '20 at 08:31
  • In this case you have to set the breaks. Try `scale_y_continuous(breaks = scales::pretty_breaks(10), label = scales::percent_format(scale = 100 / n_id))`. – stefan May 21 '20 at 08:43
  • I'll edit my question adding what I get using 'break'. – jeff May 21 '20 at 08:49
  • 1
    Hi @SabrinaG. What's clear is that something weird happens with the normalization. Seems to be the case that `n_id` is slightly larger the the number of obs used in the plot. (???) But this is tricky to solve without having a look at the real data. To get nice breaks you could try setting the breaks via `breaks = seq(0, n_id, length.out = 11)`. However, this is probably (or quite suzre) no solution to the problem. – stefan May 21 '20 at 09:30
  • With absolute values on the y-axes I had the following: 10.000, 20.000 and 30.000. When I changed the y-axes into percentages I got: 28%, 56% and 84%, given that the total number of observations is 35.514. And when I use 'breaks' it keep on plotting those percentages, adding others in between. And I don't understand why. Thanks anyway – jeff May 21 '20 at 10:16
  • Thanks @stefan! Your last advice solved the situation. – jeff May 21 '20 at 10:25
  • You are welcome @SabrinaG.. If you want do me a favor: Mark the question as answered. Besides giving me some credit it shows others with a similar problem that the solution worked and removes the question from the queue of questions still waiting for an answer. – stefan May 21 '20 at 11:11
  • Sorry @stefan, how may I mark the question as answered and give you some credit? I'm sorry for asking but I'm quite new! Thanks again – jeff May 22 '20 at 10:08
  • Hi @SabrinaG. No problem. Everything is fine. Look at https://stackoverflow.com/help/someone-answers on how to mark a question as answered. Thanks for your reply. – stefan May 23 '20 at 07:54
0

I assume You will need to create a new column of the percentages, by taking the total number of rows, and then dividing each "value" in your column by the total to get what percentage it represents.

Daniel_j_iii
  • 3,041
  • 2
  • 11
  • 27
0

Simply normalising your y-values seems to do the trick:

library(ggplot2)

ggplot(mtcars, aes(x = cyl, y = mpg/max(mpg))) +
  geom_point() +
  scale_y_continuous(label = scales::label_percent())

Created on 2020-05-19 by the reprex package (v0.3.0)

MSR
  • 2,731
  • 1
  • 14
  • 24