Questions tagged [porter-stemmer]

An algorithm designed to remove common morphological and inflectional endings from English words.

The algorithm was developed in 1979 in Cambridge and has since been ported to may different languages. Full description here

132 questions
36
votes
3 answers

Stemming algorithm that produces real words

I need to take a paragraph of text and extract from it a list of "tags". Most of this is quite straight forward. However I need some help now stemming the resulting word list to avoid duplicates. Example: Community / Communities I've used an…
Dave
  • 828
  • 1
  • 13
  • 18
29
votes
7 answers

Stemming English words with Lucene

I'm processing some English texts in a Java application, and I need to stem them. For example, from the text "amenities/amenity" I need to get "amenit". The function looks like: String stemTerm(String term){ ... } I've found the Lucene Analyzer,…
Mulone
  • 3,603
  • 9
  • 47
  • 69
15
votes
2 answers

nltk stemmer: string index out of range

I have a set of pickled text documents which I would like to stem using nltk's PorterStemmer. For reasons specific to my project, I would like to do the stemming inside of a django app view. However, when stemming the documents inside the django…
jkarimi
  • 1,247
  • 2
  • 15
  • 27
12
votes
4 answers

The reverse process of stemming

I use a lucene snowball analyzer to perform stemming . The results are not meaningful words . I referred this question . One of the solution is to use a database that contains a map between the stemmed version of the word to one stable version of…
CTsiddharth
  • 907
  • 12
  • 21
12
votes
7 answers

Is there a java implementation of Porter2 stemmer

Do you know any java implementation of the Porter2 stemmer(or any better stemmer written in java)? I know that there is a java version of Porter(not Porter2) here : http://tartarus.org/~martin/PorterStemmer/java.txt but on…
Bikash Gyawali
  • 969
  • 2
  • 15
  • 33
10
votes
5 answers

I want a Java Arabic stemmer

I'm looking for a Java stemmer for Arabic. I found a lib called "AraMorph" , but its output is uncontrollable and it makes formation to words which is unwanted. Is there any other stemmer for Arabic ?
Kareem Hashem
  • 121
  • 1
  • 3
9
votes
3 answers

PorterStemmer doesn't seem to work

I am new to python and practising with examples from book. Can anyone explain why when I am trying to stem some example with this code nothing is changed? >>> from nltk.stem import PorterStemmer >>> stemmer=PorterStemmer() >>> stemmer.stem('numpang…
Aikin
  • 319
  • 2
  • 5
  • 13
6
votes
1 answer

"Opposite" of Porter Stemmer algorithm?

I'm looking for some way of performing the opposite of a Porter Stemmer algorithm, i.e. the string "search" would return an array "searches, searched, searching etc.." Does something like this exist already (pref in php)? Thank you for your help!
Fred
  • 1,021
  • 5
  • 13
  • 29
5
votes
2 answers

Does stemming and fuzzy search work together in Apache Solr

I am using porter filter factory for a field which has 3 to 4 words in it. Eg : "ABC BLOSSOM COMPANY" I expect to fetch the above document when i search for ABC BLOSSOMING COMPANY as well. When i query this: name:ABC AND name:BLOSSOMING AND…
Bhavana67
  • 116
  • 8
5
votes
1 answer

Stemming option in stanfordcorenlp

Problem: Is there an option to stem the words using stanford-core-nlp? I am not able to find one! I am using the stanford-corenlp-3.5.2.jar. Code: public class StanfordNLPTester { public static void main (String args[]){ String paragraph =…
raikumardipak
  • 1,461
  • 2
  • 29
  • 49
4
votes
2 answers

Solr Snowball stemmer is inconsistent with Spanish

I have this stemmed field:
Chewie
  • 7,095
  • 5
  • 29
  • 36
4
votes
2 answers

Is there an implementation of a croatian word stemming algorithm?

i'm searching for an implementation of a croatian word stemming algorithm. Ideally in Java but i would also accept any other language. Is there somewhere a community of english speaking developers, who are developing search applications for the…
Chris
  • 15,429
  • 19
  • 72
  • 74
4
votes
3 answers

Stop words and stemmer in java

I'm thinking of putting a stop words in my similarity program and then a stemmer (going for porters 1 or 2 depends on what easiest to implement) I was wondering that since I read my text from files as whole lines and save them as a long string, so…
N00programmer
  • 1,111
  • 4
  • 13
  • 17
4
votes
1 answer

Is there a tool to obtain all get all derivatives of a word in PHP?

I need to input "face" and get "facial, faces, faced, facing, facer, faceable" etc. I've come across some ineffective programs which do the opposite, such as SNOWBALL and a couple of Porter Stemming PHP scripts which don't seem to work. I'm…
user734063
  • 569
  • 1
  • 5
  • 13
4
votes
1 answer

StandardAnalyzer with stemming

Is there a way to integrate PorterStemFilter into StandardAnalyzer in Lucene, or do I have to copy/paste StandardAnalyzers source code, and add the filter, since StandardAnalyzer is defined as final class. Is there any smarter way? Also, if I would…
Kobe-Wan Kenobi
  • 3,694
  • 2
  • 40
  • 67
1
2 3
8 9