1

I am working on counting the frequency of unique words in a text document in R 3.2.2. I have collapsed so many articles into one single text document now and framed into corpus using tm package.

desc<-paste(column_input, collapse=" ")
desrc <- VectorSource(desc)
decorp<-Corpus(desrc)
#dedtm <- DocumentTermMatrix(decorp)
#dedtm <- TermDocumentMatrix(decorp)

There are 12000 odd terms in that one text doc. To proceed forward with matrix operations, I am not quite sure which is better method. Term Document matrix or Document Term matrix ?

I hope that depends upon context. Is it better to use Term Document matrix rather than Document Term matrix in case of fewer documents with more terms. I just wanted to understand the logic behind this. So, I hope there is no need for any reproducible example. Any suggestions would be greatly appreciated.

Thanks in advance,

Bala

smci
  • 32,567
  • 20
  • 113
  • 146
Bala
  • 193
  • 1
  • 9
  • What is your end goal? Classification (binary? multiclass?), regression, topic modeling...? *"I just wanted to understand the logic behind this. So, I hope there is no need for any reproducible example."* Sorry, asking for a general tutorial or resource without asking any specific coding question is offtopic here and will get this question closed unless you rephrase it. – smci Aug 10 '16 at 01:12

0 Answers0