Questions tagged [stemming]

The process for reducing inflected words to their stem.

In linguistic morphology and information retrieval, stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form—generally a written word form

531 questions
0
votes
1 answer

Stanford CoreNLP Morphology.stemStatic disable lowercase conversion?

The comments on the stemStatic method of the Morphology class state that it will: return a new WordTag which has the lemma as the value of word(). The default is to lowercase non-proper-nouns, unless options have been…
Darrell Berry
  • 203
  • 2
  • 14
0
votes
0 answers

Elastic Search Analyzer - Stemmer Not Working

The settings for one of my indexes is as follows, however the stemmer isn't being applied. For example a search for fox will not pick up articles that include the term foxes. I can't see why as the order of the filters is correct (lowercase precedes…
SSED
  • 475
  • 3
  • 9
  • 22
0
votes
2 answers

stemming words in python

I'm using this code to stem words, here is how it works, first there's a list of suffixes, the program checks if the word has the ending same as the one in the list if positive it removes the suffix, however, when I run the code I get this result: …
Andrew Ravus
  • 451
  • 1
  • 7
  • 14
0
votes
0 answers

Extracting keywords from given query

I am implementing keyword based search project. Thus, during the processing the input, the program must extract key words in given ways: ignore punctuation marks (i.e .!?, etc.) ignore binding words (i.e and, or, so etc.) last and important task is…
Rauf Aghayev
  • 300
  • 1
  • 12
0
votes
1 answer

Does Mahout support word stemming?

I'm using mahout to do topic discovery using LDA. To prepare my data I use seq2sparse which tokenize the document and creates n-grams. However it does not support word stemming by default. I wonder to know is Mahout has any built-in word stemming?…
HHH
  • 6,085
  • 20
  • 92
  • 164
0
votes
2 answers

Stemming text located in a list

import re import…
enderub
  • 21
  • 2
0
votes
1 answer

Porter stemmer algorithm in information-retrieval

I need to create simple search engine for my application. Let's simplify it to the following: we have some texts (a lot) and i need to search and show relevant results. I've based on this great article extend some things and it works pretty well…
nrudnyk
  • 924
  • 1
  • 7
  • 25
0
votes
1 answer

Greek words stemming Lucene

Is there any way to stem single Greek words with Lucene? Do I need to index the String, or there a simpler way? I did some research and I found this link, but I don't really know how to use the Greek Stemming Filter.
Vegeta
  • 135
  • 6
0
votes
1 answer

R : Text Analysis - tm Package - stemComplete error

Machine: Windows 7 - 64 bit R Version : R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet" I am working on stemming some text for an analysis that I am doing, I am able to do everything all the way up until 'stemComplete' For more context please see…
Jacob Johnston
  • 121
  • 1
  • 1
  • 5
0
votes
1 answer

Why did PortStemmer in NLTK converts my "string" into u"string"

import nltk import string from nltk.corpus import stopwords from collections import Counter def get_tokens(): with open('comet_interest.xml','r') as bookmark: text=bookmark.read() lowers=text.lower() …
Hao Wu
  • 143
  • 1
  • 11
0
votes
1 answer

Stemming with term vector component in solr

I am using term vector component in solr for building tag cloud. I am also using porter-stem factory for stemming at index and query time both. The problem is term vector gives shows stemmed words in final output with term frequency. Example :- If…
user199354
  • 505
  • 1
  • 5
  • 17
0
votes
1 answer

Default english stemming in SOLR

I'm trying to do simple english words stemming in SOLR, but for some reason I'm not successful. my xml doc looks like this: 1 walked 2
Egizeris
  • 448
  • 1
  • 5
  • 19
0
votes
1 answer

Difference in handling possessive (apostrophes) with english stemmer between 1.2 and 1.4

We have two instances of elastic search, one running 1.2.1 and one 1.4, the settings and the mapping is identical on the indices running on both instances, yet the results are different. The setting for the default analyzer: .... analysis: { …
Dmitry Fink
  • 1,032
  • 1
  • 13
  • 31
0
votes
1 answer

Calling a Snowball/Porter2 Stemmer from T-SQL

I'm trying to come up with an easy way for analysts at my office to invoke a stemmer from MSSQL. It would be used to generate stemmed notes fields for two purposes: to create training sets where the most common stemmed notes fields are counted…
user2027080
  • 115
  • 1
  • 10
0
votes
1 answer

Lucene - default search lemmatization/stemming

Does Lucene default search do lemmatization/stemming on the words? For example when using the code in this sample, are the words in the docs used as is or are they transformed to their basic form (i.e. Managing -> manag), and if so what default…
zvisofer
  • 1,346
  • 18
  • 41