Highest Voted 'stemming' Questions

3

votes

3 answers

Stemming does not work properly for MongoDB text index

I am trying to use full text search feature of MongoDB and observing some unexpected behavior. The problem is related to "stemming" aspect of the text indexing feature. The way full text search is described in many articles online, if you have a…

asked Mar 31 '14 at 16:26

Michael Smolyak

593
2
6
21

3

votes

1 answer

Stemming in Text Classification - Degrades Accuracy?

I am implementing a text classification system using Mahout. I have read stop-words removal and stemming helps to improve accuracy of Text classification. In my case removing stop-words giving better accuracy, but stemming is not helping much. I…

machine-learning mahout stemming text-classification

asked Mar 24 '14 at 07:26

GS Majumder

999
6
8

3

votes

2 answers

Stemming some plurals with wordnet lemmatizer doesn't work

Hi i've a problem with nltk (2.0.4): I'm trying to stemming the word 'men' or 'teeth' but it doesn't seem to work. Here's my code: ############################################################################ import nltk from nltk.corpus import…

nltk python-2.6 wordnet stemming lemmatization

asked Mar 11 '14 at 18:23

BlackOwl

99
1
1
8

3

votes

1 answer

Text Classification - using stemmer degrades results?

There's this article about sentiment analysis of Arabic. In the beginning of page 5 it says that: "Experiments also show that stemming words before feature extraction and classification nearly always degrades the results". Later on in the same…

nlp sentiment-analysis stemming text-classification

asked Jan 22 '14 at 21:47

Cheshie

2,777
6
32
51

3

votes

1 answer

Sphinx morphology stem_en not working

I have a single-field Sphinx index with stemming set up as follows: index main_sphinxalert { # Options: type = rt path = /var/lib/sphinxsearch/data/main_sphinxalert morphology = stem_en #…

sphinx stemming

asked Apr 22 '13 at 12:23

awidgery

1,896
1
22
36

3

votes

1 answer

Configuring Custom Lucene Analyzer to accept certain stop words

I need to modify the lucene analyzer for it to be able to recognize the word "Ben" (Dutch stop word). Kindly guide me further. How do I make Lucene Analyzer accept this word as a regular word? Repository.xml for…

lucene analyzer stemming hippocms

asked Apr 02 '13 at 18:39

user1901762

73
5

3

votes

1 answer

How does Word find matching word forms in Advanced Search?

I have a word document that has occurrences of both "perform" and "performance". When I use the advanced find tool in the Word UI (goal to eventually translate this to the Find.Execute command for C# programmatic searching), I get difference results…

c# algorithm search ms-word stemming

asked Jul 11 '12 at 19:09

Chris W.

63
7

3

votes

1 answer

Multi language full text: Which stemming [Snowball] language should be used?

Which stemming language I should be using if I want to support all language full text search. As far as I know the index need to created using that specific stemming language to support search with that language, but this is not possible for me as…

stemming full-text-search snowball

asked Apr 30 '12 at 12:15

ManojMarathayil

712
11
28

3

votes

2 answers

Does stemming harm precision in text classification?

I have read stemming harms precision but improves recall in text classification. How does that happen? When you stem you increase the number of matches between the query and the sample documents right?

text nlp classification stemming

asked Apr 29 '12 at 03:31

samsamara

4,630
7
36
66

2

votes

1 answer

Strange behavior of Lucene SpanishAnalyzer class with accented words

I'm using the SpanishAnalyzer class in Lucene 3.4. When I want to parse accented words, I'm having a strange result. If I parse, for example, these two words: "comunicación" and "comunicacion", the stems I'm getting are "comun" and "comunicacion".…

lucene analyzer diacritics stemming

asked Nov 24 '11 at 05:41

Max

81
1
4

2

votes

3 answers

How to use stemDocument in the R language tm (text mining) package?

I am trying to stem a Corpus using stemDocument in the R language tm package which calls Java. I have tried the example in the tm manual: data("crude") crude[[1]] stemDocument(crude[[1]]) and get the following error: Could not initialize the…

java r stemming

asked Oct 01 '11 at 13:15

user974490

31
1
4

2

votes

1 answer

NLP - Worse result when adding stemming or lemmitization for Sentiment Analysis

I'm trying to create a full pipeline of results for sentiment analysis for a smaller subset of the IMDB reviews (only 2k pos, 2k neg) so I'm tryna show results at each stage i.e. without any pre-processing, then basic cleaning (remove specials,…

machine-learning nlp sentiment-analysis stemming lemmatization

asked Dec 13 '22 at 00:15

AdamG

21
3

2

votes

1 answer

Solr does not provide existing result

I hope you can help me, because this problem drives me crazy. To make it simple I have documents with fields named name_text_de_de which has following…

solr solrcloud stemming

asked Nov 09 '22 at 15:26

Fide

109
1
3
8

2

votes

1 answer

How to perform stemming and put back the words in the orginal review format?

I have a dataset with one column being full_text that contains review text from an online website. I wanted to clean these reviews, by removing stop words and stemming and putting them back to their original format (having all stemmed words forming…

r nlp stop-words stemming snowball

asked Jul 17 '22 at 12:05

Adrianna

45
3

2

votes

1 answer

Trying to convert plural words to singular words using regex but want to ignore a few words

I am currently trying to replace some of the plural words like removing "s" from "birds" and replacing it as "bird" in bigquery but I want them to ignore a few words like "less", "james", "this". I was able to come up with this which ignores the…

regex google-bigquery stemming re2

asked Jan 08 '22 at 00:58

Kishan Kumar

173
1
13

Questions tagged [stemming]