3

To convey the relative frequencies of key words, I would like each "bar" in a plot to consist of one of the words repeated vertically by its frequency. The ggplot code below removes the outline of the bar and the fill, but how can I create a "stack" of words as (or in) a bar according to the word's frequency? Thus "global" would start at the x-axis and repeat "global" three times vertically, position against 1, 2, and 3 of the y-axis; "local" would stack five times, etc.

# a toy data frame
words <- c("global", "local", "firm")
freq <- c(3, 5, 6)
df <-data.frame(cbind(words, freq))

library("ggthemes")
# a very unimpressive and uninformative plot
ggplot(df, aes(x = words, y = freq)) +
  geom_bar(stat = "identity", fill = "transparent", colour = "white") +
  theme_tufte()

enter image description here

I tried to use annotation_custom() with a textGrob but couldn't figure out how to repeat the word by its frequency.

Thank you for any guidance.

lawyeR
  • 7,488
  • 5
  • 33
  • 63

1 Answers1

1

Here's a quick hack that might meet your needs (though I'd bet there's a better way to do this):

library(dplyr) 

# Data frame with each word appearing a number of times equal to its frequency
df.freq = data.frame(words=rep(words, freq))

# Add a counter from 1 to freq for each word. 
# This will become the `y` value in the graph.
df.freq = df.freq %>% 
  group_by(words) %>%
  mutate(counter=1:n())

# Graph the words as if they were points in a scatterplot
p1 = ggplot(df.freq, aes(words, counter-0.5)) +
  geom_text(aes(label=words), size=12) +
  scale_y_continuous(limits=c(0,max(df.freq$counter))) +
  labs(x="Words",y="Freq") +
  theme_tufte(base_size=20) + 
  theme(axis.text.x=element_blank(),
        axis.ticks.x=element_blank())

# Save the plot, adjusting the aspect ratio so that the words stack nicely
# without large gaps between each copy of the word
pdf("word stack.pdf", 6,3.5)
p1
dev.off()

Here's a png version, since SO doesn't display PDF files.

enter image description here

If you're not set on using a stack of words, another option is to stick with a bar plot and add the word to the middle of each bar. For example:

# a toy data frame
words <- c("global", "local", "firm")
freq <- c(3, 5, 6)
df <-data.frame(words, freq)

ggplot(df, aes(words, freq)) +
  geom_bar(stat="identity", fill=hcl(195,100,65)) +
  geom_text(aes(label=words, y=freq*0.5), colour="white", size=10) +
  theme_tufte(base_size=20) + 
  theme(axis.text.x=element_blank(),
        axis.ticks.x=element_blank())

enter image description here

eipi10
  • 91,525
  • 24
  • 209
  • 285