how to use stemDocument in R?

Question

update:

Thanks for help. Check comments. Because of package version, I delete the tolower and it works. I just need to find another way to make it lower.

============ I am doing basic txt mining in with a list of document, everything goes on fine till I try to use stemmDocument.

the tm_map I already done is as following with library(tm)

fbVec<-VectorSource(data[,1])
fbCorpus<-Corpus(fb.vec)
fbCorpus <- tm_map(fbCorpus, tolower)
fbCorpus <- tm_map(fbCorpus, removePunctuation)
fbCorpus <- tm_map(fbCorpus, removeNumbers)
fbCorpus <- tm_map(fbCorpus, removeWords, stopwords("english"))
fbCorpus <- tm_map(fbCorpus, removeWords, "pr")
fbCorpus <- tm_map(fbCorpus, stripWhitespace)

The results from it is as following

[[1]]
[1]  easy post position search resumes improvement searching resumes

[[2]]
[1]  easy use good candidiates improvement allow multiple emails sent 

[[3]]
[1]  applicants young kids absolutely sales experience waste time looking improvement applicants apply experience looking dont need kids just high school

[[4]]
[1]  abundance resumes

Then I tried to stem

library(SnowballC)    
fbCorpus <- tm_map(fbCorpus, stemDocument)

But the results is not as I image, it looks like only deal with the last word in a sentence, result as following:

[[1]]
[1]  easy post position search resumes improvement searching resum

[[2]]
[1]  easy use good candidiates improvement allow multiple emails sent 

[[3]]
[1]  applicants young kids absolutely sales experience waste time looking improvement applicants apply experience looking dont need kids just high school

[[4]]
[1]  abundance resum

Is there anyone can help?

I cannot replicate your result. Stemming with your data and code above works for me. What version of the `tm` and `SnowballC` library do you have installed? `sessionInfo() ` should tell you — MrFlick, Jun 19 '14 at 17:44
@MrFlick [1] SnowballC_0.5 textcat_1.0-2 RTextTools_1.4.2 SparseM_1.03 tm_0.6 NLP_0.1-3 it is so strange.... — user3754216, Jun 19 '14 at 17:55
I ran on tm 0.5.10. I helped someone before with tm 0.6 and it changed some things. I think the problem may be `tolower`. Can you try with out that? — MrFlick, Jun 19 '14 at 18:07
@MrFlick Oh, yes! it is tolower! I delete it and it works! Don't know why.Thx! I suppose now I just need another way to make it all lower:) — user3754216, Jun 19 '14 at 18:17
I've posted a workaround as an answer. Hopefully that should work. (Not sure since i'm not running 0.6 so i can't test) — MrFlick, Jun 19 '14 at 18:26

MrFlick · Accepted Answer · 2014-06-27T04:54:41.283

4

This problem appears in tm 0.6 and has to do with using functions that are not in the list of getTransformation() from tm. The problem is that tolower just returns a character vector, and not a "PlainTextDocument" like tm_map would like. The tm packages provides the content_transformer function to take care of managing the PlainTextDocument

fbCorpus  <- tm_map(fbCorpus, content_transformer(tolower))

edited Jun 27 '14 at 04:54

answered Jun 19 '14 at 18:20

MrFlick

195,160
17
277
295

user2481422 · Answer 2 · 2014-06-19T16:32:01.377

0

You are not loading you document correctly. If you have your data in x.csv file then use following:

      > x <- read.csv(file_loc, header = TRUE) // where file_loc is the path to the csv file
      > x <- data.frame(lapply(x, as.character), stringsAsFactors=FALSE)

     > require(tm)
         Loading required package: tm

     > dd <- Corpus(DataframeSource(x))

      > inspect(dd)

Then simply use stemDocument like below:

  fbCorpus <- tm_map(fbCorpus, stemDocument)

edited Jun 19 '14 at 16:32

answered Jun 19 '14 at 16:26

user2481422

868
3
17
31

Please read my question,not just the title. I used this already, just it did not have a expected result. – user3754216 Jun 19 '14 at 16:29
Thx...but my code is exactly as your...and it did not work as well – user3754216 Jun 19 '14 at 16:58

score 0 · Answer 3 · edited Aug 14 '20 at 06:32

0

I had the same problem.

If you look at the arguments for stemDocuments you can specify the language of stemming. I found by specifying "English" it solved the problem for me.

stemDocument(language="english")

edited Aug 14 '20 at 06:32

Darren Tsai

32,117
5
21
51

answered Aug 14 '20 at 00:54

Matthew Goldenberg

1
2

how to use stemDocument in R?

3 Answers3

Linked