1

My data looks like this:

> str(bigrams_joined)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   71319 obs. of  2 variables:
 $ line   : int  1 1 1 1 1 1 1 1 1 1 ...
 $ bigrams: chr  "in practice" "practice risk" "risk management" "management is"

I would like to plot the top 10 or 15 most frequently occurring bigrams in my dataset to a bar chart in ggplot2 and have the bars running horizontally with the labels on the y-axis.

Any help with this is greatly appreciated!

Thank you

Davide Lorino
  • 875
  • 1
  • 9
  • 27

2 Answers2

1

You could something like this, dplyr's top_n function to filter the top 15 bigrams + ggplot to plot them.

library(dplyr)
library(ggplot2)


bigrams_joined %>%
  top_n(15, bigrams) %>% 
  ggplot(aes(bigrams)) + 
  geom_bar() +  
  coord_flip()

or ordered:

bigrams_joined %>%
  group_by(bigrams) %>% 
  mutate(n = n()) %>% 
  ungroup() %>% 
  top_n(15, bigrams) %>% 
  mutate(bigrams = reorder(bigrams, n)) %>%
  ggplot(aes(bigrams)) + 
  geom_bar() +
  coord_flip()
phiver
  • 23,048
  • 14
  • 44
  • 56
1

Looks like you need to count() your bigrams (from dplyr), and then you need to order them in your plot. For that these days, I prefer to use something like fct_reorder() from forcats.

library(janeaustenr)
library(tidyverse)
library(tidytext)

data_frame(txt = prideprejudice) %>%
    unnest_tokens(bigram, txt, token = "ngrams", n = 2) %>%
    count(bigram, sort = TRUE) %>%
    top_n(15) %>%
    ggplot(aes(fct_reorder(bigram, n), n)) +
    geom_col() +
    coord_flip() +
    labs(x = NULL)
#> Selecting by n

Created on 2018-04-22 by the reprex package (v0.2.0).

Julia Silge
  • 10,848
  • 2
  • 40
  • 48