Questions tagged [stemming]

The process for reducing inflected words to their stem.

In linguistic morphology and information retrieval, stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form—generally a written word form

531 questions
0
votes
1 answer

stem words and create index without stop words using Lucene 4.0

I have the following problem:there are several text documents which I need to parse and create an index but without stop words and to stem the terms.I can do it manually but I heard from a colleague about Lucene which can do it automatically. I…
user1864229
  • 53
  • 2
  • 4
  • 11
0
votes
1 answer

Ruby Lingua::Stem alternative

Is there a free alternative of Perl Lingua::Stem module, able to handle Russian language? Thanks
Fluffy
  • 27,504
  • 41
  • 151
  • 234
0
votes
1 answer

Stemming Algorithm

i have a question about Porter Stemmer Algorithm, I researched on the internet, but i couldn't find what the difference between understemming and overstemming. and is the Porter Algorithm understemming or overstamming? do you have an idea? Thanks…
aldimeola1122
  • 806
  • 5
  • 13
  • 23
0
votes
1 answer

sphinx search: how to get the frequency word list that are stemmed?

i'm trying to get the frequency list of words from indexer command line tool and get it with the words unstemmed, although i set the morphology = stem_en in index settings and search itself works fine on words with same stem. Is there a way to get…
lompy
  • 341
  • 1
  • 3
  • 12
0
votes
1 answer

How do I configure and use KStem in java?

I want to stem the words in my document and have zeroed in on KStem. I am working in Eclipse and have configured Lucene by downloading the lucene-core jar file to the lib folder and adding it to the build path. I similarly did this for the KStem jar…
abhishek
  • 817
  • 6
  • 19
  • 33
0
votes
1 answer

Krovetz stemming alogrithm ( KStemming) help needed

Could you please explain me the algorithm for Krovetz stemming alogrithm ( Kstemming) , i want to know how its working. Thanks in advance
jeyaprakash
  • 179
  • 2
  • 11
0
votes
1 answer

ElasticSearch stemming with protected words

I'm using ElasticSearch (via Ruby, Tire) for a search feature on an ecommerce clothing website. I need a stemming filter, BUT I also need to be able to specify a list of protected words which do not get stemmed. Currently I'm using the snowball…
awhitworth
  • 93
  • 7
0
votes
2 answers

Solr - Wild Card Search varies with Stemming Methods

I have 2 versions of solr working in my machine . say SolrVer1 and SolrVer2 SolrVer1 have applied , below stemming methods on field type text_en_splitting
meghana
  • 907
  • 1
  • 20
  • 45
0
votes
2 answers

How do I read in an editable file that contains words that I don't want stemmed using Lingua::Stem's add_exceptions($exceptions_hash_ref) in perl?

I am using Perl's Lingua::Stem module (Lingua::Stem) and I want to have a text file or other editable file format to contain a list of words I do not want stemmed. I want to be able to add words to the file any time. Their example…
DemiSheep
  • 698
  • 2
  • 15
  • 44
0
votes
1 answer

Stemming + wildcarding: unexpected effects

I am editing a lucene .net implementation (2.3.2) at work to include stemming and automatic wildcarding (adding of * at the ends of words). I have found that exact words with wildcarding don't work. (so stack* works for stackoverflow, but…
Sean
  • 696
  • 2
  • 9
  • 24
0
votes
1 answer

How can I search a single word in apache Solr?

I am using Apache Solr for indexing using DataImportHandler. The document structure is as follows: id(long), title(text), abstract(text), pubDate(date) I combined title and abstract filed fro text searching.My problem is when I query "title:…
milind_db
  • 1,274
  • 5
  • 34
  • 56
0
votes
1 answer

Customizing KStem filter in Solr

I'm trying to evaluate switching stemming filters in Solr from Porter to KStem. I see reference to the ability to configure KStem via a direct_conflations.txt file and other files, but I can't seem to find documentation on how this file should be…
Reggie Pharkle
  • 140
  • 1
  • 4
0
votes
1 answer

Drools for Morphological Analysis

Is Drools suitable for writing rules for Stemming and/or POS tagging ? Suggestions for a better rule-language are welcome. I read many papers in this field that use the rule-based approach but none of them mentioned what library or framework was…
omarzd
  • 66
  • 6
0
votes
1 answer

Need explanation on Language Stemmer of Solr

I'm using nutch with Solr for a developing a search engine for Arabic texts. I need to implement a stemmer on my Arabic texts, and while serching on Solr Stemmer I found that it provide those two filters
sakurami
  • 343
  • 3
  • 18
0
votes
1 answer

R's text mining package... adding a new function to getTransformation

I am attempting to add a new stemmer that works using a table look up method. if h is the hash the contains the stemming operation, it is encoded as follows: keys as words before stemming and values as words post-stemming. I would like to ideally…