Questions tagged [stemming]

The process for reducing inflected words to their stem.

In linguistic morphology and information retrieval, stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form—generally a written word form

531 questions
3
votes
2 answers

Stemming process not working in Python

I have a text file that I am trying to stem after having removed stopwords but it seems that nothing changes when I run it. My file is called data0. Here are my codes: ## Removing stopwords and tokenizing by words (split each word) from nltk.corpus…
Economist_Ayahuasca
  • 1,648
  • 24
  • 33
3
votes
0 answers

Stemming for Polish language using Google App Engine Python Search Api

I'm trying to use Python Search Api in Google App Engine to search through set of Polish documents and I found, that stemming feature is not working as expected. The word "red" in English has only one form, although there are different forms of it…
3
votes
0 answers

Elasticsearch snowball in French not stemming correctly

I've seen a problem with the same stem word in French. Here is an example: snowball in French or curl -XDELETE http://localhost:9200/stacko36088193 curl -XPOST http://localhost:9200/stacko36088193 -d ' { "index": { "number_of_shards": 1, …
Roukmoute
  • 681
  • 1
  • 11
  • 26
3
votes
1 answer

How to Stem Shakespere/KJV Using nltk.stem.snowball

I want to stem early modern English text: sb.stem("loveth") >>> "lov" Apparently, all I need to do is a small tweak to the Snowball Stemmer: And to put the endings into the English stemmer, the list ed edly ing ingly of Step 1b should be…
Joseph
  • 691
  • 1
  • 4
  • 12
3
votes
1 answer

In the Porter Stemming algorithm, what is the purpose of including an identity rule such as SS -> SS?

What is the point of the Porter Stemmer algorithm having a rule the converts SS to SS?
CodyBugstein
  • 21,984
  • 61
  • 207
  • 363
3
votes
1 answer

Snowball Stemming: defining Regions

I'm trying to understand the snoball stemming algorithmus. The algorithmus is using two regions R1 and R2 that are definied as follows: R1 is the region after the first non-vowel following a vowel, or is the null region at the end of the word if…
HW90
  • 1,953
  • 2
  • 21
  • 45
3
votes
1 answer

Solr how can I have the original term first than the stemmed version?

I have been trying to get the exact key matched result first in the Solr 5.0.0 result. For Example, Meditation Bowls Goddess Bowls Celestial Bowls Bowling Green 33 Bowls Tibetan Singing Bowls Dust Bowl Revival Bowl of Stars If I search for a word…
User123
  • 91
  • 6
3
votes
1 answer

Are there any Lucene stemmers that handle Shakespearean English?

I'm trying to index some old documents for searching -- 16th, 17th, 18th century. Modern stemmers don't seem to handle the antiquated word endings: worketh, liveth, walketh. Are there stemmers that specialize in the English from the time of…
Eric Wilson
  • 57,719
  • 77
  • 200
  • 270
3
votes
1 answer

How to split a text into two meaningful words in R

this is the text in my dataframe df which has a text column called 'problem_note_text' SSCIssue: Note Dispenser Failureperformed checks / dispensor failure / asked the stores to take the note dispensor out and set it back / still error message…
Shweta Kamble
  • 432
  • 2
  • 10
  • 21
3
votes
2 answers

Is it possible to get a natural word after it has been stemmed?

I have a word play which after stemming has become plai. Now I want to get play again. Is it possible? I have used Porter's Stemmer.
odbhut.shei.chhele
  • 5,834
  • 16
  • 69
  • 109
3
votes
1 answer

How Do I Use BrazilianStemmer in Lucene 4?

i'm trying to tokenize and stem a portuguese sentence using Lucene 4. Based on this [thread] (How to use a Lucene Analyzer to tokenize a String?) i was abble to correctly tokenize a portuguese sentence. However, no stemming were been applied. Thus,…
3
votes
1 answer

multiple results of one variable when applying tm method "stemCompletion"

I have a corpus containing journal data of 15 observations of 3 variables (ID, title, abstract). Using R Studio I read in the data from a .csv file (one line per observation). When performing some text mining operations I got some trouble when using…
Dobby
  • 75
  • 5
3
votes
1 answer

Morphology:Tool to get the root word and suffix for a given english word

I am trying to do morph analysis in POS tagging. Is there any tool (which I can call from within a python or java script) which returns the Root form and its suffix , when we call it by passing an English word as parameter. For example: if I give…
3
votes
1 answer

Hunspell affix condition regex format. Any way to match the start?

Good day. I'm trying to use Hunspell as a stemmer in my application. I don't quite like porter and snowball stemming because of their "chopped" words results like "abus", "exampl". Lemmatizing seems like a good alternative, but I don't know any good…
SimpleV
  • 396
  • 4
  • 14
3
votes
1 answer

Turn stemming off in Lucene

I need to turn off the stemming of the EnglishAnalyzer or other similar analyzers (such as the ItalianAnalyzer, ecc..)I'm using Lucene 3.6.2 and i saw that is only possible to specify a set of words that should not be stemmed using this…
Luca Mastrostefano
  • 3,201
  • 2
  • 27
  • 34