R : Text Analysis - tm Package - stemComplete error

Question

Machine: Windows 7 - 64 bit R Version : R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet"

I am working on stemming some text for an analysis that I am doing, I am able to do everything all the way up until 'stemComplete' For more context please see the below;

Packages:

TM
SnowballC
rJava
RWeka
Rwekajars
NLP

Sample list of words

test <- as.vector(c('win', 'winner', 'wins', 'wins', 'winning'))

Convert to Corpus

Test_Corpus <- Corpus(VectorSource(test))

Text manipulations`

Test_Corpus <- tm_map(Survey_Corpus, content_transformer(tolower))
Test_Corpus <- tm_map(Survey_Corpus, removePunctuation)
Test_Corpus <- tm_map(Survey_Corpus, removeNumbers)

Stemming using tm_map under the tm package

>Test_stem <- tm_map(Test_Corpus, stemDocument, language = 'english' )

Below is the result from stemming above, which is all correct so far:

win
winner
win
win
win

Now comes the issue! When I try to use test_corpus as a dictionary to transform the words back to an appropriate format using the following code;

>Test_complete <- tm_map(Test_stem, stemCompletion, Test_Corpus)

Below is the error message that I am getting:

Warning messages:

1: In grep(sprintf("^%s", w), dictionary, value = TRUE) :
argument 'pattern' has length > 1 and only the first element will be  used
2: In grep(sprintf("^%s", w), dictionary, value = TRUE) :
argument 'pattern' has length > 1 and only the first element will be used
3: In grep(sprintf("^%s", w), dictionary, value = TRUE) :
argument 'pattern' has length > 1 and only the first element will be used
4: In grep(sprintf("^%s", w), dictionary, value = TRUE) :
argument 'pattern' has length > 1 and only the first element will be used
5: In grep(sprintf("^%s", w), dictionary, value = TRUE) :
argument 'pattern' has length > 1 and only the first element will be used

I have tried several things listed on previous posts and seen that other people with the same problem have tried with no luck. Below is a list of those things:

Update Java
used content_transformation
used PlainTextDocument

I'm not sure your formatting is doing what you think it is. Indent for code blocks (including comments) and try to avoid overuse of headers. — Nathan Tuggy, Feb 20 '15 at 01:19

score 0 · Answer 1 · answered Feb 20 '15 at 03:17

0

I think you need to save your test_corpus as a dictionary before the stemming process. You could try something like Test_Corpus <- corpus then you could start the steming and using corpus later on in Test_complete <- tm_map(corpus, stemCompletion).

answered Feb 20 '15 at 03:17

saldaihani

1
2

By changing the name of the corpus at the point of stemming it does the same things right? – Jacob Johnston Feb 20 '15 at 20:12