I am trying to use R for text mining purposes using "tm" package. Please look at the frequency plot where it identifies Forest and Forests as two different words. How can I correct it i.e. I would prefer the total frequency count for Forests alone as a summation of both forest and forests. Thanks Frequency plot on R
Asked
Active
Viewed 118 times
1
-
2Possible duplicate of [R text mining - dealing with plurals](http://stackoverflow.com/questions/34938023/r-text-mining-dealing-with-plurals) – DJack Mar 22 '17 at 14:34
1 Answers
3
You can use a stemming function of some sort. SnowballC
provides this functionality (wordStem
function).
It will reduce all words to their stem.
Example
stem(forests) = forest
stem(forest) = forest

PinkFluffyUnicorn
- 1,260
- 11
- 20
-
I have already used Snowball c library(SnowballC) > docs <- tm_map(docs, stemDocument) – Shubham Sharma Mar 22 '17 at 16:00
-
If that does not work as expected, maybe have a look at this thread: http://stackoverflow.com/questions/24311561/how-to-use-stemdocument-in-r – PinkFluffyUnicorn Mar 23 '17 at 07:40