1

I have just created a Term Document Matrix in R but now I want to rename some of the terms.

For example here

vector <- "This is a test."

library(tm)

doc.vec <- VectorSource(vector)
doc.corpus <- Corpus(doc.vec)

TDM <- TermDocumentMatrix(doc.corpus)

Inspect the TDM matrix, it will output in

    Docs
    Terms   1
    test. 1
    this  1

Now I want to rename e.g "test." to "anything". The reason is that when I mine my text, there are words like "big data" which obviously belongs together. So in the first step, I use gsub to replace "big data" with "bigdata". However at the end, I want them the output to be "big data".

Thx in advance for helps.

Dat Tran
  • 2,368
  • 18
  • 25

1 Answers1

1

Here's one approach, not answering your first question but addressing what you said your needs are:

vector <- "This is a test.  I use big data.  That's George Washington!"

library(tm)
library(qdap)

vector2 <- space_fill(vector, c("big data", "George Washington"))

doc.vec <- VectorSource(vector2)
doc.corpus <- Corpus(doc.vec)

TDM <- TermDocumentMatrix(doc.corpus)
rownames(TDM) <- gsub("~~", " ", rownames(TDM))
inspect(TDM)

                    Docs
Terms                1
  big data.          1
  george washington! 1
  test.              1
  that's             1
  this               1
  use                1
Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519