What is the point of the Porter Stemmer algorithm having a rule the converts SS
to SS
?

- 21,984
- 61
- 207
- 363
1 Answers
Imagine the rule SS->SS
was not in the algorithm. Then words like caress
would not be recognized at all and it would seem that algorithm can't do anything to reduce it to a stem. However, with the rule SS->SS
the stemmer says: "I recognize the word caress
and I reduce it to caress
. I'm done". The alternative would be: "I can't do anything". Of course it is fictitious work but what matters since is that it increases the precision of the stemmer. You can see that when the testing of the algorithm is being done. If this rule was not in the stemmer the results would have been different (worse). Look at the word list [ridiculousness, caress]
Case 1.
Rule SS->SS
in the algorithm.
Stemming:
caress (Step 1a)-> caress OK
ridiculousness (Step 2)-> ridiculous (step 4) -> ridicul OK
Success rate: 100%
Case 2.
Rule SS->SS
not in the algorithm.
Stemming:
caress -> fail OK
ridiculousness (Step 2)-> ridiculous (step 4) -> ridicul OK
Success rate: 50%
From practical point of view this rule doesn't matter. It's just a formalism.

- 4,336
- 1
- 19
- 30
-
Are you saying that the algorithm would not proceed to the next steps without this "operation"? – CodyBugstein Oct 07 '15 at 18:23
-
@Certainly not! Look at the word `ridiculousness`. Assume `SS->SS` was not in the algorithm. The word would be processed no early than `Step 2`.But for some words ending with `SS`, like `caress`, the algorithm outputs `fail` - he could not find a rule. This would ultimately decrease the success rate. – sve Oct 07 '15 at 18:28
-
1So it's the equivalent of `SS -> do nothing but note that it was detected` ? – CodyBugstein Oct 07 '15 at 18:41
-
@Imray that's right! You should note that these kind of things are just formalisms. In the end, it doesn't matter. But the thing is if you have `SS->SS` more profit for you since your algorithm would deal with more cases. – sve Oct 07 '15 at 18:42
-
@sve As far as I understand - what you wrote is not completely true; In step 1a there's the rule `S -> `, so the word `caress` (assuming there's no identity rule) would end up as `care` (which is still wrong, but the algorithm does not fail). – noamgot Dec 27 '18 at 19:59
-
if it just increases success rate then what stops us from adding more identity rules to match as many words as possible? – Deil Jun 15 '19 at 10:18