1

I am trying to use R for text mining purposes using "tm" package. Please look at the frequency plot where it identifies Forest and Forests as two different words. How can I correct it i.e. I would prefer the total frequency count for Forests alone as a summation of both forest and forests. Thanks Frequency plot on R

  • 2
    Possible duplicate of [R text mining - dealing with plurals](http://stackoverflow.com/questions/34938023/r-text-mining-dealing-with-plurals) – DJack Mar 22 '17 at 14:34

1 Answers1

3

You can use a stemming function of some sort. SnowballC provides this functionality (wordStem function).

It will reduce all words to their stem.

Example

stem(forests) = forest
stem(forest) = forest
PinkFluffyUnicorn
  • 1,260
  • 11
  • 20