2

I am working on a qualitative analysis project in the tm package of R. I have built a corpus and created a term document matrix and long story short I need to edit my term document matrix and conflate some of its rows. To do this I have exported it out of R using

write.csv()

I then have imported the csv file back into R but am struggling to figure out how to get R to read it as a TermDocumentMatrix or DocumentTermMatrix.

I tried using the suggestions of the following example code with no avail.

It seems to keep reading my matrix as if it was a corpus and each cell as a single document.

# change this file location to suit your machine
file_loc <- "C:\\Documents and Settings\\Administrator\\Desktop\\Book1.csv"
# change TRUE to FALSE if you have no column headings in the CSV
 x <- read.csv(file_loc, header = TRUE)
 require(tm)
 corp <- Corpus(DataframeSource(x))
 dtm <- DocumentTermMatrix(corp)

Is there any way to import in a csv matrix that will be read as a termdocumentmatrix or documenttermmatrix without having R read the csv as if each cell is a document?

Peyman Mohamadpour
  • 17,954
  • 24
  • 89
  • 100
lrampe
  • 21
  • 1

2 Answers2

1

You're not reading documents, so skip the Corpus() step. This should work directly:

myDTM <- as.DocumentTermMatrix(x, weighting = weightTf)

For next time, consider saving the TDM object as .RData as this will not require conversion, and is also much more efficient.

Ken Benoit
  • 14,454
  • 27
  • 50
1

If you want to keep the format of any data, I would recommend to use the save() function. You can save any R objects into a .RData file. And when you want to retrieve the data, you can use the load() function.

John Smith
  • 1,604
  • 4
  • 18
  • 45