1

I want to try some kind of prediction stuff similar to this one: https://www.quora.com/How-do-I-use-LDA-Latent-Dirichlet-Allocation-for-document-classification-preferably-with-solutions-that-can-be-implemented-in-R

I think that I will have to merge my raw data with the topic_doc_distr table using the doc_id as unique identifier, but I actually don't know how.

/edit: Will the doc_id be persistent or is it getting obsolet after the Corpus creation / data frame transformation?

I've tried the following R-Code, but I don't know how to add the doc_id in there.

test <- doc_topic_distr

Any clues?

Flocke Haus
  • 55
  • 1
  • 6

1 Answers1

0

Solved it like this:

newDF <- merge(x=df_old, y=df_additions, by="doc_id",all=TRUE)

with df_old: raw files df_additions: doc-topic-distr as data frame

Flocke Haus
  • 55
  • 1
  • 6