0

I'm performing topic-modelling applying "Text Mining with R: A tidy approach" by Silge and Robinson.

It is not shown how to plot figure 3.6, showing the "greatest difference in β between topic 2 and topic 1".

enter image description here

I searched the internet including ways to subset the values by applying ranking, descending and ascending simultaneously starting from zero.

Best regards

Anders Jørgensen
  • 195
  • 1
  • 1
  • 9
  • You can find that code [here on the book's GitHub repo](https://github.com/dgrtwo/tidy-text-mining/blob/abe38c72c40ce8d12c9e6d3d2adcc317e524fc96/06-topic-models.Rmd#L100-L108); all the code that generates the book is available there. – Julia Silge Jun 22 '21 at 04:14

1 Answers1

0

After you have created the "beta_wide" object you apply this code:

beta_wide %>%
          group_by(log_ratio > 0) %>%
          top_n(10, abs(log_ratio)) %>%
          ungroup() %>%
          ggplot(aes(fct_reorder(term, log_ratio), log_ratio, fill = log_ratio > 0)) +
          geom_col(alpha = 0.8, show.legend = FALSE) +
          coord_flip() +
          theme_minimal() +
          labs(x = "words",
               y = "log2 ratio of beta in topic 2 /topic 1") +
          scale_fill_brewer(palette = "Set1")

You can plot the words with the lowest differences by changing 10 to -10.

Anders Jørgensen
  • 195
  • 1
  • 1
  • 9