I have a DocumentTermMatrix and I´d like to replace specific terms in this document and to create a frequency table.
The starting point is the original document as follows:
library(tm)
library(qdap)
df1 <- data.frame(word =c("test", "test", "teste", "hey", "heyyy", "hi"))
tdm <- as.DocumentTermMatrix(as.character(df1$word))
When I create a frequency table of the original document I get the correct results:
freq0 <- as.matrix(sort(colSums(as.matrix(tdm)), decreasing=TRUE))
freq0
So far so good. However, if replace some terms in the document then the new frequency table gets wrong results:
tdm$dimnames$Terms <- mgsub(c("teste", "heyyy"), c("test", "hey"), as.character(tdm$dimnames$Terms), fixed=T, trim=T)
freq1 <- as.matrix(sort(colSums(as.matrix(tdm)), decreasing=TRUE))
freq1
Obviously or perhaps some indexing in the document is wrong because even same terms are not regarded as identical while counting the terms.
This outcome should be the ideal case:
df2 <- data.frame(word =c("test", "test", "test", "hey", "hey", "hi"))
tdm2 <- as.DocumentTermMatrix(as.character(df2$word))
tdm2$dimnames$Terms <- mgsub(c("teste", "heyyy"), c("test", "hey"), as.character(tdm2$dimnames$Terms), fixed=T, trim=T)
freq2 <- as.matrix(sort(colSums(as.matrix(tdm2)), decreasing=TRUE))
freq2
Can anyone help me to figure out the problem?
Thx in advance