4

Is there an easy way to label individual "blocks" in a "stacked" bar graph such as the following. I'd like the labels to be positioned near the top of each block, but my latest approach results in somehow swapping the Texts for USA and Mexico as below.

example bar graph

Looking around for a solution, I've only found approaches whereby the y value for the text has to be pre-computed externally, which, aside from the extra logic, brings the issue of controlling the order in which the blocks are stacked...
I also found this stackoverflow question where I got the idea of using a geom="text" in a stat_bin (see code below)
Here's a trimmed down code snippet for illustration of my current approach. I'm not necessarily trying to fix this snippet, any generic idiom to label stack bar-graphs areas will do!
Edit: (in view of the two answer this question got so far)
I'd like to stress that I'd rather solutions which don't imply pre-computing the y position of the text.

# sample data source
df.StackData <- data.frame(
    QType = c("A4-1", "A4-1", "A4-1",  "B3", "B3", "B3"),
    Country = c("Canada", "USA", "Mexico", "Canada", "USA", "Mexico"),
    NbOfCases = c(1000, 1320, 380, 400, 1000, 812),
    AvgRate = c(17.2, 11.4, 44.21, 17.3, 15.3, 39.7),
    Comment = c("Can", "US", "Mex", "Can", "US", "Mex")
)

and the ggplot invocation. It produces the graph shown above, with the odd swap of labels (and also an extra legend, 'though this legend issue is easy to take care of; I just noted it while preparing this question).

ggplot(data=df.StackData,
       aes(x=QType, y=NbOfCases, fill=Country))+
  geom_bar(stat="identity", width=1) +
  stat_bin(geom="text", aes(label=paste("R coef =",
                                        formatC(AvgRate, format="f", digits=3),
                                        "(", Comment, ")" ),
                            vjust=1.5, size=3 
                        )
  )

My initial attempts added a geom_text() to the graph as follow, but of course the y value was wrong (lacing texts relative to the very bottom of graph rather than that to the bottom of the individual blocks) ...

  ... +
  geom_text(mapping=aes(x=QType, y=NbOfCases, 
                        label=paste("R coef =",
                                    formatC(AvgRate, format="f", digits=3),
                                    "(", Comment, ")" ),
                         vjust=1.5),
            size=3)
Community
  • 1
  • 1
mjv
  • 73,152
  • 14
  • 113
  • 156

4 Answers4

2

Here's a solution. There are two things here. First, you should reorder the levels of your data.frame to the same order as you've in your data df.StackData. Second, create another data.frame to calculate the y-position by computing the cumulative sums of the data.

# reorder levels of factor to the same order as found in data
df.StackData$Country <- factor(df.StackData$Country, 
          levels=c("Canada", "USA", "Mexico"), ordered=TRUE)
p <- ggplot(data=df.StackData, aes(x=QType, fill=Country))
p <- p + geom_bar(aes(weights=NbOfCases))

# compute corresponding y-axis positions by cumulative sum
require(plyr)
df <- ddply(df.StackData, .(QType), function(x) {
    x$NbOfCases <- cumsum(x$NbOfCases)
    x
})

# then use geom_text with data = df (the newly created data)
p + geom_text(data = df,  aes(x=QType, y=NbOfCases, 
        label=paste("R coef =", 
        formatC(AvgRate, format="f", digits=3), 
        "(", Comment, ")" ), vjust=1.5), size=3)

enter image description here

Edit: If you don't want to calculate the y-pos yourself, then you'll have to use stat_bin. Just reorder the levels of column Country and it works:

# data
df.StackData <- data.frame(
    QType = c("A4-1", "A4-1", "A4-1",  "B3", "B3", "B3"),
    Country = c("Canada", "USA", "Mexico", "Canada", "USA", "Mexico"),
    NbOfCases = c(1000, 1320, 380, 400, 1000, 812),
    AvgRate = c(17.2, 11.4, 44.21, 17.3, 15.3, 39.7),
    Comment = c("Can", "US", "Mex", "Can", "US", "Mex")
)

# just add this: reorder the level 
df.StackData$Country <- factor(df.StackData$Country, 
          levels=c("Canada", "USA", "Mexico"), ordered=TRUE)

# your code again using stat_bin (just changed the width to 0.75)
ggplot(data=df.StackData,
       aes(x=QType, y=NbOfCases, fill=Country))+
  geom_bar(stat="identity", width=.75) +
  stat_bin(geom="text", size=4, aes(label=paste("R coef =",
                                        formatC(AvgRate, format="f", digits=3),
                                        "(", Comment, ")" ),
                            vjust=1.5))

enter image description here

Arun
  • 116,683
  • 26
  • 284
  • 387
  • Thank you, Arun, I'm trying to stay away from all the solutions that imply pre-computing the y position and feeding it to the geom_text, but it appears that may not be possible... Do you know of other idioms that do not require pre-computation? – mjv Mar 14 '13 at 21:16
  • Just try your first solution after doing this: `df.StackData$Country <- factor(df.StackData$Country, levels=c("Canada", "USA", "Mexico"), ordered=TRUE)` – Arun Mar 14 '13 at 21:21
  • Bingo that did it. I'm left with removing the undesired Legend, but that should be easy enough. Thank you! – mjv Mar 14 '13 at 21:30
  • Sure, I'll make an edit if I manage to find removing the legend. – Arun Mar 14 '13 at 21:30
  • You're very kind. I can't work on this ATM, but this typically happens when ggplot figures it has extra/different aesthetics. I'll await a few hours in case some other R wizard comes along to teach us both a nifty trick, but I'll otherwise be sure to accept yr answer. It has been quite helpful. – mjv Mar 14 '13 at 21:35
  • @mjv, please check the edit. I know that the reason for 2 legends is because of two aesthetics. What I failed to notice was your inclusion of `size` *inside* `aes`. You should put it outside: `size=3` or `size=4`... Check the code and the plot. – Arun Mar 14 '13 at 21:58
  • Thank you --did the trick! Also "duh" on the inside vs. outside of the aes() thing; I bump into that every once in a while and it always takes me a few iterations to get it right. BUT your factor ordering effectively solved the issue! FYI, do check `alexwhan`'s solution whereby one can order the data.frame rather than ordering the factor; depending on one's use, it is nice to keep the factor in order it default alphabetic order. Bottom line factor and df order need to match, doesn't matter which of the two drives the other. – mjv Mar 15 '13 at 02:10
  • As of 2021, this code gives an error of 'stat_bin() can only have an x or y aesthetic.' Could someone who understands the revised syntax give an update, pls, for how to generate this graph? – InColorado Oct 16 '21 at 01:09
2

Here is a solution

df2 = ddply(df.StackData, .(QType), transform, 
 pos = cumsum(NbOfCases) - 0.5 * NbOfCases)

ggplot(data = df2, aes(x = QType, y = NbOfCases, fill = Country)) +
  geom_bar(stat = "identity") +
  geom_text(aes(y = pos, label = paste("R coef =", 
   formatC(AvgRate, format="f", digits=3), "(", Comment, ")" ))
  )

Imgur

Ramnath
  • 54,439
  • 16
  • 125
  • 152
  • Thank you, Ramnath, I'm trying to stay away from all the solutions that imply pre-computing the y position and feeding it to the `geom_text`, but it appears that may not be possible... Do you know of other idioms that do not require pre-computation? – mjv Mar 14 '13 at 21:11
1

Here's an alternative - because your factor will be ordered alphabetically by default, I suggest reordering your dataframe to match this instead of reordering the factor to match the order of the dataframe. To my mind, this would allow a more general solution. The only reason you were getting a legend you didn't want is that you had size inside aes - I've fixed that below.

Using your data:

df.StackData <- with(df.StackData, df.StackData[order(Country),])

and you can then just use your original solution with stat_bin. I tested it with a bit of a more complex dataset just to check if it works:

df.StackData <- data.frame(
  QType = rep(c("A4-1","B3"), each = 6),
  Country = rep(c("Canada", "USA", "Mexico", "UK", "Sweden", "Australia"), times = 2),
  NbOfCases = c(1000, 1320, 380, 400, 1000, 812, 542, 531, 674, 328, 795, 721),
  AvgRate = c(17.2, 11.4, 44.21, 17.3, 15.3, 39.7, 21.1, 25.3, 24.1, 31.3, 38.4, 36.1),
  Comment = rep(c("Can", "US", "Mex", "UK", "Aus", "Swe"), times = 2)
)

Without sorting:

ggplot(data=df.StackData,
       aes(x=QType, y=NbOfCases, fill=Country))+
  geom_bar(stat="identity", width=1) +
  stat_bin(geom="text", aes(label=paste("R coef =", formatC(AvgRate, format="f", digits=3),
"(", Comment, ")" ),  vjust = 1),size=3)
geom_text(aes(label = Comment), stat="identity")

enter image description here

After sort:

df.StackData <- with(df.StackData, df.StackData[order(Country),])

enter image description here

alexwhan
  • 15,636
  • 5
  • 52
  • 66
  • Thank you Alex, that is a good approach as well. And yeah about the putting various formatting properties inside vs. outside the aesthetic object... that is of course the reason -fairly enough, from ggplot's standpoint- for introducing more legends. – mjv Mar 15 '13 at 01:52
1

In order to remove the extra legend you can use show_guide=FALSE. In your example:

ggplot(data=df.StackData,
       aes(x=QType, y=NbOfCases, fill=Country))+
  geom_bar(stat="identity", width=.75) +
  stat_bin(geom="text", size=4, aes(label=paste("R coef =",
                                        formatC(AvgRate, format="f", digits=3),
                                        "(", Comment, ")" ),
                            vjust=1.5), show_guide=FALSE)
Stedy
  • 7,359
  • 14
  • 57
  • 77
Felipe
  • 11
  • 1