0

I have text data stored in two different formats- as a dataframe and as a series of folders (because of the storage type, I'm not sure I will be able to post this question in a reproducible format).

I'm able to create a corpus from each of these different text sources as below, but was wondering how to combine them into one corpus using the topicmodels package in R?

I've executed:

dataA<- Corpus(DirSource(foldersA), readerControl = list(language = "eng"))
dataB<- Corpus(DataframeSource(dataframeB),readerControl = list(language = "eng"))

But want to combine them into one unified corpus.

sabrina
  • 43
  • 1
  • 1
  • 8
  • You should be able to just use `dataBoth <- c(dataA, dataB)` – MrFlick Jul 30 '18 at 19:23
  • Ah totally right, concatenate will work here, but the base c combines the objects in a list format. To get the correct c, I called tm:::c.VCorpus. Found help here: https://stackoverflow.com/questions/48224166/combine-corpora-in-tm-0-7-3 – sabrina Jul 30 '18 at 19:41
  • I don't think you should be calling that non-exported function directly. The `tm` package claims to already have overloaded methods for it: https://www.rdocumentation.org/packages/tm/versions/0.7-4/topics/tm_combine. Did you try `c()` first and it really didn't work? – MrFlick Jul 30 '18 at 19:44
  • I did try, and it worked to combine those two data sources in a list object, but the list object can't be coerced into a DocumentTermMatrix subsequently – sabrina Jul 31 '18 at 01:33

0 Answers0