Topic Modeling: graphical representation of words with the greatest differences between two topics

Question

In Text Mining with R, methods for unsupervised classification of documents, such as blog posts or news articles, are introduced. This is work for topic modeling. I'm running the codes enclosed in this link, but I do not know how obtain Figure 6.3, "Words with the greatest difference in beta between topic 2 and topic 1".

Any suggestions please?

score 2 · Accepted Answer · answered Mar 02 '20 at 21:36

This book has source available, you can just click the edit button and be taken to the GitHub project with the current page to edit. Just navigate to the chapter that you need (a Rmd file) and look for the text closest to the image.

Thankfully this image was also made with R, so you can just check: here

Posting for completeness:

beta_spread %>%
  group_by(direction = log_ratio > 0) %>%
  top_n(10, abs(log_ratio)) %>%
  ungroup() %>%
  mutate(term = reorder(term, log_ratio)) %>%
  ggplot(aes(term, log_ratio)) +
  geom_col() +
  labs(y = "Log2 ratio of beta in topic 2 / topic 1") +
  coord_flip()

Topic Modeling: graphical representation of words with the greatest differences between two topics

1 Answers1