How can I identify words (say forest, forests) as one word either "Forest" or "Forests" in R using text mining package?

Question

I am trying to use R for text mining purposes using "tm" package. Please look at the frequency plot where it identifies Forest and Forests as two different words. How can I correct it i.e. I would prefer the total frequency count for Forests alone as a summation of both forest and forests. Thanks Frequency plot on R

Possible duplicate of [R text mining - dealing with plurals](http://stackoverflow.com/questions/34938023/r-text-mining-dealing-with-plurals) — DJack, Mar 22 '17 at 14:34

score 3 · Answer 1 · answered Mar 22 '17 at 14:35

3

You can use a stemming function of some sort. SnowballC provides this functionality (wordStem function).

It will reduce all words to their stem.

Example

stem(forests) = forest
stem(forest) = forest

answered Mar 22 '17 at 14:35

PinkFluffyUnicorn

1,260
11
20

I have already used Snowball c library(SnowballC) > docs <- tm_map(docs, stemDocument) – Shubham Sharma Mar 22 '17 at 16:00
If that does not work as expected, maybe have a look at this thread: http://stackoverflow.com/questions/24311561/how-to-use-stemdocument-in-r – PinkFluffyUnicorn Mar 23 '17 at 07:40

How can I identify words (say forest, forests) as one word either "Forest" or "Forests" in R using text mining package?

1 Answers1