Questions tagged [stemming]

The process for reducing inflected words to their stem.

In linguistic morphology and information retrieval, stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form—generally a written word form

531 questions
4
votes
1 answer

Italian Stemmer alternative to Snowball

I'm trying to analyze the texts in Italian in R. As you do in a textual analysis I have eliminated all the punctuation, special characters and Italian stopwords. But I have got a problem with Stemming: there is only one Italian stemmer (Snowball),…
4
votes
1 answer

Is there a way to reverse stem in python nltk?

I have a list of stems in NLTK/python and want to get the possible words that create that stem. Is there a way to take a stem and get a list of words that will stem to it in python?
JoeShmoe
  • 43
  • 4
4
votes
2 answers

Difference between Lucene stemmers: EnglishStemmer, PorterStemmer, LovinsStemmer

Have anybody compared these stemmers from Lucene (package org.tartarus.snowball.ext): EnglishStemmer, PorterStemmer, LovinsStemmer? What are the strong/weak points of algorithms behind them? When each of them should be used? Or maybe there are some…
Paul Lysak
  • 1,284
  • 1
  • 14
  • 18
4
votes
4 answers

python, Stemmer not found

I got this code from github and this code will execute on windows machine 64 bit. Here's the error I get: Traceback (most recent call last): File "new.py", line 2, in import stemmer ModuleNotFoundError: No module named 'stemmer' import…
saqibiqbal
  • 43
  • 1
  • 1
  • 4
4
votes
1 answer

Stemming words with NLTK (python)

I am new to Python text processing, I am trying to stem word in text document, has around 5000 rows. I have written below script from nltk.corpus import stopwords # Import the stop word list from nltk.stem.snowball import SnowballStemmer stemmer =…
user3734568
  • 1,311
  • 2
  • 22
  • 36
4
votes
3 answers

Stemming full strings on Python

I need to perform stemming on portuguese strings. To do so, i'm tokening the string using nltk.word_tokenize() function a then stemming each word individually. After that, I rebuild the string. It's working, but not performing well. How can i make…
yuridamata
  • 459
  • 1
  • 5
  • 13
4
votes
0 answers

Stemming Dutch words with the Kraaij-Pohlmann algorithm

I am trying to stem Dutch words in a corpus in R. I have found the SnowballC package, but this doesn't seem to work well for Dutch. For example: wordStem(c("huis", "huizen", "huisje", "huisjes"), language = "porter") [1] "huis" "huiz" …
Charlotte
  • 41
  • 4
4
votes
1 answer

Smart stemming/lemmatizing in Python for Nationalities

I am working with Python, and I would like to find the roots of some words, that mainly refer to countries. Some examples that demonstrate what I need are: Spanish should give me Spain. English should give me England. American should give me…
4
votes
4 answers

SQL word root matching

I'm wondering whether major SQL engines out there (MS SQL, Oracle, MySQL) have the ability to understand that 2 words are related because they share the same root. We know it's easy to match "networking" when searching for "network" because the…
Max
  • 12,794
  • 30
  • 90
  • 142
4
votes
2 answers

Stop-word elimination and stemmer in python

I have a somewhat large document and want to do stop-word elimination and stemming on the words of this document with Python. Does anyone know an of the shelf package for these? If not a code which is fast enough for large documents is also…
Hossein
  • 40,161
  • 57
  • 141
  • 175
4
votes
1 answer

StandardAnalyzer with stemming

Is there a way to integrate PorterStemFilter into StandardAnalyzer in Lucene, or do I have to copy/paste StandardAnalyzers source code, and add the filter, since StandardAnalyzer is defined as final class. Is there any smarter way? Also, if I would…
Kobe-Wan Kenobi
  • 3,694
  • 2
  • 40
  • 67
4
votes
2 answers

Are Snowball & SnowballC packages different in R?

I am using stemDocument for stemming text document using tm package in R. Example code: data("crude") crude[[1]] stemDocument(crude[[1]]) I get an error message: Error in loadNamespace(name) : there is no package called ‘Snowball’ I have…
Ram
  • 331
  • 1
  • 3
  • 11
4
votes
3 answers

MySQL fulltext with stems

I am building a little search function for my site. I am taking my user's query, stemming the keywords and then running a fulltext MySQL search against the stemmed keywords. The problem is that MySQL is treating the stems as literal. Here is the…
johnnietheblack
  • 13,050
  • 28
  • 95
  • 133
4
votes
1 answer

NLTK words lemmatizing

I am trying to do lemmatization on words with NLTK. What I can find now is that I can use the stem package to get some results like transform "cars" to "car" and "women" to "woman", however I cannot do lemmatization on some words with affixes like…
noben
  • 531
  • 1
  • 7
  • 16
4
votes
1 answer

Advanced Search Option in Solr corresponding to DtSearch options

We are replacing the search and indexing module in an application from DtSearch to Solr using solrnet as the .net Solr client library. We are relatively new to Solr/Lucene and would need some help/direction to understand the more advanced search…
koder
  • 887
  • 9
  • 29