-1

To perform NLP tasks, like "Predict which Tweets are about real disasters and which ones are not" from KAGGLE, link: https://www.kaggle.com/c/nlp-getting-started

Which task should I perform to normalize my texts, Lemmatization or stemming?

Thank you!

1 Answers1

1

That depends on what you want to do.

Lemmatisation is linguistically motivated, and generally more reliable to give a correct result when reducing an inflected word to its base form. However, it is more resource intensive.

Stemming is (usually) a short procedure which uses string matching to remove parts of a string. This is much faster, doesn't need a lexicon, but the results aren't as accurate.

There is also a difference in output: Lemmatisation preserves the base class, so revolved is changed into revolve, and revolution remains unchanged (it is already the base form). In some stemming algorithms, derivational suffixes (-tion) are also removed, so all of the above might end up as revol. This might be what you want, as it returns something like a 'stem', or base morpheme.

Oliver Mason
  • 2,240
  • 2
  • 15
  • 23