Add detected topics to input data

Question

library(dplyr)
library(ggplot2)
library(stm)
library(janeaustenr)
library(tidytext)

library(quanteda)
testDfm <- gadarian$open.ended.response %>%
    tokens(remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE)  %>%
    dfm()
    
out <- convert(testDfm, to = "stm")
documents <- out$documents
vocab <- out$vocab
meta <- out$meta

topic_model<- stm(documents = out$documents, vocab = out$vocab, K = 5)

Using these lines a topic modeling approach is possible

How is it possible to use tidytext in order to receive for every row of input data gadarian see every row linkedin to which topic, adding topics to input data?

Example of expected output

"MetaID" "treatment" "pid_rep"  "open.ended.response" "topic_number"

Update code as example of expected output:

library(stm)
library(tidyr)
library(quanteda)
testDfm <- gadarian$open.ended.response %>%
    tokens(remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE)  %>%
    dfm()
    
out <- convert(testDfm, to = "stm")
documents <- out$documents
vocab <- out$vocab
meta <- out$meta

fittedModel <- stm(documents = out$documents, vocab = out$vocab, K = 5)

documentMatches <- findThoughts(fittedModel, texts = gadarian$open.ended.response, n = 1)
docTopics <- sapply(1:nrow(gadarian), function(docIndex) { names(documentMatches$index[documentMatches$index == docIndex][1]) })
gadarian$topic <- docTopics

I think there is not enough explanation about what you are trying to do — Paolo Lorenzini, Dec 01 '20 at 16:17
which one is your input dataset gadarian? based on your code testDfm is single data frame but the rest are lists — Paolo Lorenzini, Dec 01 '20 at 16:30
by the way, so you want to add a column in gadarian$open.ended.response? — Paolo Lorenzini, Dec 01 '20 at 16:45
what kind of values do you want the new column to contain? just NAs? — Paolo Lorenzini, Dec 01 '20 at 16:52
the gadarian$topic contains the information for the topics of only 5 rows, the rest is NA. Do you want to have the information of the corresponding topic for each row? if yes, how are the topics assigned? — Paolo Lorenzini, Dec 01 '20 at 17:06

score 2 · Answer 1 · answered Dec 01 '20 at 17:48

2

install.packages("reshape2")
library(reshape2)
td_beta <- tidy(fittedModel)
td_beta
td_beta %>%
  group_by(topic) %>%
  top_n(10, beta) %>%
  ungroup() %>%
  ggplot(aes(term, beta)) +
  geom_col() +
  facet_wrap(~ topic, scales = "free") +
  coord_flip()
td_gamma <- tidy(fittedModel, matrix = "gamma",
                 document_names = rownames(gadarian))
td_gamma

answered Dec 01 '20 at 17:48

Paolo Lorenzini

579
2
15

could this be useful? it gives a word-topic and document-topic information – Paolo Lorenzini Dec 01 '20 at 17:48
https://juliasilge.github.io/tidytext/reference/stm_tidiers.html – Paolo Lorenzini Dec 01 '20 at 17:51
yes, I understand, you want to assign topics to the 341 rows and not all splitter, I will try to check it again – Paolo Lorenzini Dec 01 '20 at 17:58

Add detected topics to input data

1 Answers1