8

Is it possible to to change words like running, helping, cooks, finds and happily into run, help, cook, find and happy using nltk?

user3378734
  • 226
  • 1
  • 3
  • 10
  • I've used the stemming filter in whoosh (see more at https://pypi.python.org/pypi/Whoosh/2.6.0) – mpez0 Apr 12 '15 at 15:00
  • possible duplicate of [Porter Stemming of fried](http://stackoverflow.com/questions/27659179/porter-stemming-of-fried) – alvas Apr 12 '15 at 21:14

2 Answers2

11

There are some stemming algorithms implemented in nltk. It looks like Lancaster stemming algorithm will work for you.

>>> from nltk.stem.lancaster import LancasterStemmer
>>> st = LancasterStemmer()
>>> st.stem('happily')
'happy'
>>> st.stem('cooks')
'cook'
>>> st.stem('helping')
'help'
>>> st.stem('running')
'run'
>>> st.stem('finds')
'find'
Irshad Bhat
  • 8,479
  • 1
  • 26
  • 36
8
>>> from nltk.stem import WordNetLemmatizer
>>> wnl = WordNetLemmatizer()
>>> ls = ['running', 'helping', 'cooks', 'finds']
>>> [wnl.lemmatize(i) for i in ls]
['running', 'helping', u'cook', u'find']
>>> ls = [('running', 'v'), ('helping', 'v'), ('cooks', 'v'), ('finds','v')]
>>> [wnl.lemmatize(word, pos) for word, pos in ls]
[u'run', u'help', u'cook', u'find']
>>> ls = [('running', 'n'), ('helping', 'n'), ('cooks', 'n'), ('finds','n')]
>>> [wnl.lemmatize(word, pos) for word, pos in ls]
['running', 'helping', u'cook', u'find']

See Porter Stemming of fried

Community
  • 1
  • 1
alvas
  • 115,346
  • 109
  • 446
  • 738
  • This should be marked as "correct". "lemmatize" is more accurate then "LancasterStemmer". Sine for the word "wires", "LancasterStemmer" returns "wir" and "leminatize" return the correct one "wire". – kennyut Jul 15 '22 at 14:14