I am currently trying to do a little bit of text processing and I would like to get the one and two letter words in a TermDocumentMatrix.
The issue is that it seems to display only 3 letter words and more.
library(tm)
library(RWeka)
test<-'This is a test.'
testmyCorpus<-Corpus(VectorSource(test))
testTDF<-TermDocumentMatrix(testmyCorpus, control=list(tokenize=AlphabeticTokenizer))
inspect(testTDF)
Only the words "this" and "test" are displayed. Any ideas?
Thanks a lot for you help! Robert