Questions tagged [stemming]

The process for reducing inflected words to their stem.

In linguistic morphology and information retrieval, stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form—generally a written word form

531 questions
0
votes
1 answer

Should I reindex documents in Elasticsearch when I change the Stemmer?

I am using Elasticsearch to index my documents (although I believe my question can apply to any other search engine such as Lucene or Solr as well). I am using Porter stemmer and a list of stop words at the index time. I know that I should apply the…
Soheil
  • 5,229
  • 1
  • 18
  • 21
0
votes
1 answer

Most memory-efficient way to combine word stemming and the elimination of hash words in Perl?

I've patched together some Perl script intended to take each word from a batch of documents, eliminate all stop words, stem the remaining words, and create a hash containing each stemmed word and its frequency of occurrence. However, after working…
Rick
  • 107
  • 2
  • 12
0
votes
1 answer

Stem comparsion algorithm

I'm writing a program that makes word declension for Polish language. In this language stems can vary in some cases (because of palatalization or mobile/fleeting e and other effects). For example, we have word "karzeł" and it is basic dictionary…
Harry
  • 144
  • 2
  • 9
0
votes
1 answer

Is there an option to toggle stemming in Solr?

I would like to have an option to turn stemming on and off in my searches using some toggling options. How can I do that ? Thanks, N
user1748101
  • 275
  • 1
  • 3
  • 9
0
votes
1 answer

Stemmer the words in NLP

can anyone tell me which is the best stemmer. Also I have a text and i only want to stem the words which are in a list and leave the rest of tokens as it is. Below is my code. Text:swot del swot analys 2013 strengths weak brand nam valu at $ 7 .',…
Raghav Shaligram
  • 309
  • 4
  • 11
0
votes
2 answers

Implementing Kstemmer

First I thank anyone who takes the time to help. The internet community is so essential for learning. Overall goal: I am inputting .txt file, stemming it using a Java build of The 2003 CIIR KStemmer in Eclipse, and outputting a list of stemmed…
0
votes
0 answers

Stemming, lemmatization in python

I have checked all the other trails and used few of the solutions. I am facing a challenge in using port stemmer. I am trying to eliminate the affixs however port stemmer reduces the words into some weird forms like languages becomes languag and…
Raghav Shaligram
  • 309
  • 4
  • 11
0
votes
0 answers

Stemming csv files in Python

Okay, I have this code in Python in which it imports two csv files. The first csv file is named "claims" (one column, many rows) and the other one is named "sexualHarassment" (one column, many rows) The program right now checks all rows of "claims"…
Abtra16
  • 145
  • 1
  • 4
  • 12
0
votes
0 answers

how to stemming indonesian using lucene

i have tried to implemented class IndonesianAnalyzer from library org.apache.lucene.analysis.id.*; now i have a problem how to use initialize class indonesian analyzer to my project ,, my code like this ??? import java.io.*; import…
0
votes
1 answer

Mapping of words to stemmed words (Stem dictionary)

I want to generate a mapping of ( word-stemmed word ) which il need for my project. I am trying to generate the mapping this way 1.i took a text ( in file 1),used rapid miner to stem all the words and saved the resulting text in another file say…
user3290349
  • 1,227
  • 1
  • 9
  • 17
0
votes
1 answer

Lucene project fatal error

I have a lot of text message, I run below lines of codes for them. // tokenize term TokenStream tokenStream = new ClassicTokenizer(LUCENE_VERSION, new StringReader(term)); // stemmize tokenStream = new PorterStemFilter(tokenStream); SOMETIMES i…
0
votes
1 answer

error message while stemming for sentiment analysis

I do stemming on my dataset for sentiment analysis and I got this error message "Error in structure(if (length(n)) n else NA, names = x) : 'names' attribute [2] must be the same length as the vector [1]" Please…
user3456230
  • 217
  • 4
  • 13
0
votes
1 answer

StanfordCoreNLP does not work in my way

I use below code. However, the outcome is not what I expected. The outcome is [machine, Learning] But I want to get [machine, learn]. How can I do this? Also, when my input is "biggest bigger", I wanna get the result like [big, big], but the outcome…
CSnerd
  • 2,129
  • 8
  • 22
  • 45
0
votes
2 answers

A simple stemming algorithm with String for input

I've been looking at word stemming algorithms such as the porter algorithm, but everything I've found so far has dealt with files as input. Are there any existing algorithms which would let me simply pass the stemmer a string, and have it return the…
user3163073
  • 11
  • 1
  • 3
0
votes
1 answer

Stemming demonyms in Solr (Russian => Russia)

Trying to match queries containing "russia" or "russian" to "Russian Federation" using Solr (as well as other country demonyms, such as "american", "syrian" etc). What is a good way to handle this without adding synonyms for each country, and…
Neil McGuigan
  • 46,580
  • 12
  • 123
  • 152