9

I am new to python and practising with examples from book.
Can anyone explain why when I am trying to stem some example with this code nothing is changed?

>>> from nltk.stem import PorterStemmer
>>> stemmer=PorterStemmer()
>>> stemmer.stem('numpang wifi stop gadget shopping')
'numpang wifi stop gadget shopping'

But when I do this it works

>>> stemmer.stem('shopping')
'shop'
BenMorel
  • 34,448
  • 50
  • 182
  • 322
Aikin
  • 319
  • 2
  • 5
  • 13
  • when I enter whole text it doesnt stem "shopping" >>>stemmer.stem('numpang wifi stop gadget shopping') 'numpang wifi stop gadget shopping' – Aikin Oct 19 '12 at 12:21
  • 1
    Does it work when you call it like this: `[stemmer.stem(w) for w in ['numpang', 'wifi', 'stop', 'gadget', 'shopping']]`? If so, the `stem` function appears to work on a single word at a time. – jro Oct 19 '12 at 12:25
  • yes it seems that I have to split text into words(((( so there is no other way to stem text without splitting it? – Aikin Oct 19 '12 at 12:29
  • @Aikin: apparently. I don't know your implementation of the algorithm, but I would find it the most sensible implementation. Using that, you can expand it to fit your needs. If this is what was your issue, I see Samuele has already expanded my comment into an answer: I'd accept it, and build on that. – jro Oct 19 '12 at 12:31
  • @jro: sorry didn't see your comment, i'd have canceled my answer and told you to post it since apparently you were first by seconds :( btw yes, the problem is just that... that algorithm is a single-word stemmer (gave just a quick peek to the code) and the split method is the only option available here – Samuele Mattiuzzo Oct 19 '12 at 12:33
  • @SamueleMattiuzzo: no worries :). – jro Oct 19 '12 at 12:36

3 Answers3

13

try this:

res = ",".join([ stemmer.stem(kw) for kw in 'numpang wifi stop gadget shopping'.split(" ")])

the problem is that, probably, that stemmer works on single words. your string has no "root" word, while the single word "shopping" has the root "shop". so you'll have to compute the stemming separately

edit:

from their source code ->

Stemming algorithms attempt to automatically remove suffixes (and in some
cases prefixes) in order to find the "root word" or stem of a given word. This
is useful in various natural language processing scenarios, such as search.

so i guess you are indeed forced to split your string by yourself

Samuele Mattiuzzo
  • 10,760
  • 5
  • 39
  • 63
4

Stemming is the process of reducing a given word to its base or inflected form, Here you are trying to stem entire sentence,

Follow these steps :

from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer
sentence = "numpang wifi stop gadget shopping"
tokens = word_tokenize(sentence)
stemmer=PorterStemmer()

Output=[stemmer.stem(word) for word in tokens]
1

Try this:

from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize

stemmer = PorterStemmer()

some_text = "numpang wifi stop gadget shopping"

words = word_tokenize(some_text)

for word in words:
    print(stemmer.stem(word))