Questions tagged [stemming]

The process for reducing inflected words to their stem.

In linguistic morphology and information retrieval, stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form—generally a written word form

531 questions
4
votes
1 answer

Elasticsearch : singular and plural results

We have used minimal_english stemmer filter in our mapping. This is to ensure that only singular and plural are searchable and not similar words. eg. Test and Tests should be searchable on entering the term - Test - but Tester,Testers,Testing should…
Himadri Pant
  • 2,171
  • 21
  • 22
4
votes
0 answers

Multi-language stemming in Haystack with ElasticSearch

I'd like to set stemming language on a per-user basis in Django Haystack with ElasticSearch as backend. In our Django model, we have image objects, that contain comma-separated tag charfield for English, Spanish, German, ...: tags_en, tags_es,…
Simon Steinberger
  • 6,605
  • 5
  • 55
  • 97
3
votes
3 answers

Slovenian stemmer for Sphinx

I am searching stemming algorithm for Slovenian language that I can use with Sphinx search. What I'm trying to achieve is for example when searching for 'jabolka', I also want results for documents containing 'jabolko', 'jabolki', 'jabolk', etc. I…
KoviNET
  • 717
  • 1
  • 7
  • 23
3
votes
1 answer

Neither stemmer nor lemmatizer seem to work very well, what should I do?

I am new to text analysis and am trying to create a bag of words model(using sklearn's CountVectorizer method). I have a data frame with a column of text with words like 'acid', 'acidic', 'acidity', 'wood', 'woodsy', 'woody'. I think that 'acid' and…
3
votes
1 answer

In Solr, why is 'built' not being stemmed to 'build' but 'building' is?

I'm trying to figure out two things in this posting: Why is 'built' NOT being stemmed to 'build' even though the field type definition has a stemmer defined. However, 'building' is being stemmed to 'build' How to use Luke to examine the index to…
jabawaba
  • 279
  • 1
  • 6
  • 16
3
votes
2 answers

Avoid slow highlighting on Solr because of stemming

I am quite new about using Solr, but would like to ask your help. I am developing an application which should be able to highlight the results of a query. For this I am using regex fragmenter:
oroszgy
  • 123
  • 7
3
votes
2 answers

One word phrase search to avoid stemming in Solr

I have stemming enabled in my Solr instance, I had assumed that in order to perform an exact word search without disabling stemming, it would be as simple as putting the word into quotes. This however does not appear to be the case? Is there a…
Ruth
  • 5,646
  • 12
  • 38
  • 45
3
votes
1 answer

R function doesn't loop through column but repeats first row result

I am trying to use the stemming function suggested in the corpus package stemming vignette here https://cran.r-project.org/web/packages/corpus/vignettes/stemmer.html but when I try to run the function on the entire column it seems to just be…
Kreitzbe87
  • 33
  • 3
3
votes
1 answer

How can I apply stemming into a dictionary?

I'm working in some kind of NLP. I compare a daframe of articles with inputs words. The main goal is classify text if a bunch of words were found I've tried to extract the values in the dictionary and convert into a list and then apply stemming to…
Chacho Fuva
  • 353
  • 1
  • 4
  • 17
3
votes
2 answers

Python nltk stemmers never remove prefixes

I'm trying to preprocess words to remove common prefixes like "un" and "re", however all of nltk's common stemmers seem to completely ignore prefixes: from nltk.stem import PorterStemmer, SnowballStemmer,…
jon_simon
  • 370
  • 7
  • 18
3
votes
1 answer

German stemmer is not removing feminine suffixes "-in" and "-innen"

In German, every job has a feminine and a masculine version. The feminine one is derived from the masculine one by adding an "-in" suffix. In the plural form, this turns into "-innen". Example: | English |…
sebrockm
  • 5,733
  • 2
  • 16
  • 39
3
votes
0 answers

Full-Text Seach and stemming on multilanguage column

I have a table with a column that contains data in different languages, like that: Id Text Language 1 name en 2 names en 3 имя ru 4 nom fr I need Full-text search for this multilingual column, but FTS is…
3
votes
1 answer

English verbs processing ending with 'e'

I am implementing few string replacers, with these conversions in mind 'thou sittest' → 'you sit' 'thou walkest' → 'you walk' 'thou liest' → 'you lie' 'thou risest' → 'you rise' If I keep it naive it is possible to use regex for this case to find &…
nehem
  • 12,775
  • 6
  • 58
  • 84
3
votes
3 answers

Word Base/Stem Dictionary

It seems my Google-fu is failing me. Does anyone know of a freely available word base dictionary that just contains bases of words? So, for something like strawberries, it would have strawberry. But does NOT contain abbreviations or misspellings or…
AHungerArtist
  • 9,332
  • 17
  • 73
  • 109
3
votes
0 answers

How to use Whoosh to extract unstemmed keywords from a text?

I’m using Whoosh with Haystack. Haystack does not abstract the keyword extraction in Whoosh, so I’m using Whoosh directly for this feature. @property def keywords(self): whoosh_backend = SearchForm().searchqueryset.query.backend if not…
Dawn Drescher
  • 901
  • 11
  • 17