Is it possible to to change words like running, helping, cooks, finds and happily into run, help, cook, find and happy using nltk?
Asked
Active
Viewed 1.1k times
8
-
I've used the stemming filter in whoosh (see more at https://pypi.python.org/pypi/Whoosh/2.6.0) – mpez0 Apr 12 '15 at 15:00
-
possible duplicate of [Porter Stemming of fried](http://stackoverflow.com/questions/27659179/porter-stemming-of-fried) – alvas Apr 12 '15 at 21:14
2 Answers
11
There are some stemming algorithms implemented in nltk
. It looks like Lancaster
stemming algorithm will work for you.
>>> from nltk.stem.lancaster import LancasterStemmer
>>> st = LancasterStemmer()
>>> st.stem('happily')
'happy'
>>> st.stem('cooks')
'cook'
>>> st.stem('helping')
'help'
>>> st.stem('running')
'run'
>>> st.stem('finds')
'find'

Irshad Bhat
- 8,479
- 1
- 26
- 36
8
>>> from nltk.stem import WordNetLemmatizer
>>> wnl = WordNetLemmatizer()
>>> ls = ['running', 'helping', 'cooks', 'finds']
>>> [wnl.lemmatize(i) for i in ls]
['running', 'helping', u'cook', u'find']
>>> ls = [('running', 'v'), ('helping', 'v'), ('cooks', 'v'), ('finds','v')]
>>> [wnl.lemmatize(word, pos) for word, pos in ls]
[u'run', u'help', u'cook', u'find']
>>> ls = [('running', 'n'), ('helping', 'n'), ('cooks', 'n'), ('finds','n')]
>>> [wnl.lemmatize(word, pos) for word, pos in ls]
['running', 'helping', u'cook', u'find']
-
This should be marked as "correct". "lemmatize" is more accurate then "LancasterStemmer". Sine for the word "wires", "LancasterStemmer" returns "wir" and "leminatize" return the correct one "wire". – kennyut Jul 15 '22 at 14:14