Questions tagged [stemming]

The process for reducing inflected words to their stem.

In linguistic morphology and information retrieval, stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form—generally a written word form

531 questions
8
votes
1 answer

SnowballStemmer for Russian words list

I do know how to perform SnowballStemmer on a single word (in my case, on russian one). Doing the next things: from nltk.stem.snowball import SnowballStemmer stemmer = SnowballStemmer("russian") stemmer.stem("Василий") 'Васил' How can I do the…
Keithx
  • 2,994
  • 15
  • 42
  • 71
8
votes
1 answer

Italian stemming library in java

i'm searching a java library or something to do stemming of italian strings of words. The goal is to compare italian words. In this moment words like "attacco", "attacchi","attaccare" etc., are considered different, instead I want returned a true…
Schiawo
  • 95
  • 7
7
votes
2 answers

Snowball Stemmer only stems last word

I want to stem the documents in a Corpus of plain text documents using the tm package in R. When I apply the SnowballStemmer function to all documents of the corpus, only the last word of each document is stemmed.…
Christian
  • 211
  • 4
  • 5
7
votes
3 answers

How do i optimize the performance of stemming and spell check in R?

I have ~1,4m documents with average of characters per document of(Median:250 and Mean:470). I want to perform spell check and stemming, before classifying them. Simulated document: sentence <- "We aree drivng as fast as we drove yestrday or evven…
Tlatwork
  • 1,445
  • 12
  • 35
7
votes
3 answers

Should I perform both lemmatization and stemming?

I'm writing a text classification system in Python. This is what I'm doing to canonicalize each token: lem, stem = WordNetLemmatizer(), PorterStemmer() for doc in corpus: for word in doc: lemma = stem.stem(lem.lemmatize(word)) The…
James Ko
  • 32,215
  • 30
  • 128
  • 239
7
votes
1 answer

SQL Server vs MySQL: CONTAINS(*,'FORMSOF(THESAURUS,word)')

I am shocked. I spent past 3-4 days figuring out how I could implement stemming (and synonyms searches) in mysql when I see in SQL Server the query is incredibly easly: Select * from tab where CONTAINS(*,'FORMSOF(THESAURUS,word)') Is possibile on…
dynamic
  • 46,985
  • 55
  • 154
  • 231
7
votes
2 answers

What is the best "turnkey" stemming algorithm?

I need a good stemming algorithm for a project I'm working on. It was suggested that I look at the Porter Stemmer. When I checked out the page on the Porter stemmer I found that it is deprecated now in favor of the "Snowball" stemmer. I need a good…
dicroce
  • 45,396
  • 28
  • 101
  • 140
7
votes
1 answer

Looking for a database or text file of english words with their different forms

I am working on a project and I need to get the root of a given word (stemming). As you know, the stemming algorithms that don't use a dictionary are not accurate. Also I tried the WordNet but it is not good for my project. I found phpmorphy project…
Majid Darabi
  • 731
  • 6
  • 15
6
votes
4 answers

Stemming - code examples or open source projects?

Stemming is something that's needed in tagging systems. I use delicious, and I don't have time to manage and prune my tags. I'm a bit more careful with my blog, but it isn't perfect. I write software for embedded systems that would be much more…
Adam Davis
  • 91,931
  • 60
  • 264
  • 330
6
votes
1 answer

Does keras-tokenizer perform the task of lemmatization and stemming?

Does keras tokenizer provide the functions such as stemming and lemmetization? If it does, then how is it done? Need an intuitive understanding. Also, what does text_to_sequence do in that?
ASingh
  • 133
  • 1
  • 4
6
votes
2 answers

Confused about priority between stemmer and pos tagger

So I was analyzing a text corpus and I used stemmer for all the tokenized words. But I also have to find all the nouns in the corpus so I again did a nltk.pos_tag(stemmed_sentence) But my question is am I doing it right? A.]…
user4197202
6
votes
1 answer

ElasticSearch Stemming

I am using ElasticSerach and I want to setup basic stemming for English. So basically, fighter returns fight or any word that contains the fight root. I am a little confused how to implement this. I was reading through the analyzers, tokenizers and…
Gabbar
  • 4,006
  • 7
  • 41
  • 78
5
votes
1 answer

Lucene.NET stemming problem

I'm running into a problem using the SnowBallAnalyzer in Lucene.NET. It works great for some words, but others it doesn't find any results on at all, and I'm not sure how to dig into this further to find out what is happening. I am testing the…
Timothy Strimple
  • 22,920
  • 6
  • 69
  • 76
5
votes
2 answers

Does stemming and fuzzy search work together in Apache Solr

I am using porter filter factory for a field which has 3 to 4 words in it. Eg : "ABC BLOSSOM COMPANY" I expect to fetch the above document when i search for ABC BLOSSOMING COMPANY as well. When i query this: name:ABC AND name:BLOSSOMING AND…
Bhavana67
  • 116
  • 8
5
votes
1 answer

Polish search for Sphinx?

I want to implement a search solution for a website written in Django. From the available options (I have researched Solr, Sphinx, Xapian, PostgreSQL/Tsearch3, MySQL) Sphinx looks like the nicest. However, it does not support stemming for Polish,…
Ryszard Szopa
  • 5,431
  • 8
  • 33
  • 43
1 2
3
35 36