3

I have noticed that DocumentTermMatrix(myCorpus, control=list(dictionary=myDict)) consumes way more memory than DocumentTermMatrix(myCorpus)

Why is this happening?

Any leads?

Here is the code snippet:

library(tm)
library(XML)
source("MyXMLReader.r") # contains the myXML reader code 
myCorpus <- Corpus(DirSource(paste(basepath,"corpus",sep=""))
readerControl = list(reader = myXMLReader))
myDict = unlist(readLines("some-file-containing-a-fixed-vocab"))

Now here is my question:

dtm = DocumentTermMatrix(mYCorpus) # takes very little extra RAM to do this
dtm = DocumentTermMatrix(myCorpus,control=list(dictionary=myDict)) # Takes a whole lot of # RAM` which is not even released after dtm is formed...

I guess there is a memory leak and possible bug.

smci
  • 32,567
  • 20
  • 113
  • 146
Shivani Rao
  • 121
  • 4
  • 1
    Put your question into context. What packages you're using, what you're trying to do, what have you done so far to pinpoint the behavior... – Roman Luštrik Jul 11 '11 at 07:39
  • Thanks Roman Lustrik, I am using R text mining package to index a corpus. Here is the code snippet library(tm) library(XML) source("MyXMLReader.r") # contains the myXML reader code myCorpus <- Corpus(DirSource(paste(basepath,"corpus",sep="")),readerControl = list(reader = myXMLReader)) myDict = unlist(readLines("some-file-containing-a-fixed-vocab") – Shivani Rao Sep 23 '11 at 17:03
  • 2
    can you supply some data please? No one can reproduce if we don't have data. – richiemorrisroe Sep 23 '11 at 18:55
  • This was asked 5 years ago, don't know what version but it must be a very old version, doesn't state any numbers on memory usage, and no dataset so we can't reproduce. Voting to close as irreproducible... – smci Jul 20 '16 at 03:24

0 Answers0