Highest Voted 'stemming' Questions

8

votes

1 answer

SnowballStemmer for Russian words list

I do know how to perform SnowballStemmer on a single word (in my case, on russian one). Doing the next things: from nltk.stem.snowball import SnowballStemmer stemmer = SnowballStemmer("russian") stemmer.stem("Василий") 'Васил' How can I do the…

asked Aug 15 '17 at 15:22

Keithx

2,994
15
42
71

8

votes

1 answer

Italian stemming library in java

i'm searching a java library or something to do stemming of italian strings of words. The goal is to compare italian words. In this moment words like "attacco", "attacchi","attaccare" etc., are considered different, instead I want returned a true…

java nlp stemming snowball

asked Nov 14 '12 at 14:45

Schiawo

95
7

7

votes

2 answers

Snowball Stemmer only stems last word

I want to stem the documents in a Corpus of plain text documents using the tm package in R. When I apply the SnowballStemmer function to all documents of the corpus, only the last word of each document is stemmed.…

r stemming tm

asked Aug 31 '11 at 21:12

Christian

211
4
5

7

votes

3 answers

How do i optimize the performance of stemming and spell check in R?

I have ~1,4m documents with average of characters per document of(Median:250 and Mean:470). I want to perform spell check and stemming, before classifying them. Simulated document: sentence <- "We aree drivng as fast as we drove yestrday or evven…

r spell-checking stemming

asked Feb 20 '20 at 12:08

Tlatwork

1,445
12
35

7

votes

3 answers

Should I perform both lemmatization and stemming?

I'm writing a text classification system in Python. This is what I'm doing to canonicalize each token: lem, stem = WordNetLemmatizer(), PorterStemmer() for doc in corpus: for word in doc: lemma = stem.stem(lem.lemmatize(word)) The…

python machine-learning nlp nltk stemming

asked Mar 19 '18 at 01:44

James Ko

32,215
30
128
239

7

votes

1 answer

SQL Server vs MySQL: CONTAINS(*,'FORMSOF(THESAURUS,word)')

I am shocked. I spent past 3-4 days figuring out how I could implement stemming (and synonyms searches) in mysql when I see in SQL Server the query is incredibly easly: Select * from tab where CONTAINS(*,'FORMSOF(THESAURUS,word)') Is possibile on…

mysql sql-server full-text-search stemming thesaurus

asked Jan 18 '11 at 18:06

dynamic

46,985
55
154
231

7

votes

2 answers

What is the best "turnkey" stemming algorithm?

I need a good stemming algorithm for a project I'm working on. It was suggested that I look at the Porter Stemmer. When I checked out the page on the Porter stemmer I found that it is deprecated now in favor of the "Snowball" stemmer. I need a good…

comparison stemming

asked Oct 22 '08 at 16:05

dicroce

45,396
28
101
140

7

votes

1 answer

Looking for a database or text file of english words with their different forms

I am working on a project and I need to get the root of a given word (stemming). As you know, the stemming algorithms that don't use a dictionary are not accurate. Also I tried the WordNet but it is not good for my project. I found phpmorphy project…

nlp stemming lemmatization

asked Aug 21 '13 at 19:31

Majid Darabi

731
6
15

6

votes

4 answers

Stemming - code examples or open source projects?

Stemming is something that's needed in tagging systems. I use delicious, and I don't have time to manage and prune my tags. I'm a bit more careful with my blog, but it isn't perfect. I write software for embedded systems that would be much more…

algorithm tags nlp stemming

asked Feb 27 '09 at 15:00

Adam Davis

91,931
60
264
330

6

votes

1 answer

Does keras-tokenizer perform the task of lemmatization and stemming?

Does keras tokenizer provide the functions such as stemming and lemmetization? If it does, then how is it done? Need an intuitive understanding. Also, what does text_to_sequence do in that?

keras nlp tokenize stemming lemmatization

asked Jun 12 '19 at 07:33

ASingh

133
1
4

6

votes

2 answers

Confused about priority between stemmer and pos tagger

So I was analyzing a text corpus and I used stemmer for all the tokenized words. But I also have to find all the nouns in the corpus so I again did a nltk.pos_tag(stemmed_sentence) But my question is am I doing it right? A.]…

python nltk stemming part-of-speech

asked Dec 01 '14 at 11:11

user4197202

6

votes

1 answer

ElasticSearch Stemming

I am using ElasticSerach and I want to setup basic stemming for English. So basically, fighter returns fight or any word that contains the fight root. I am a little confused how to implement this. I was reading through the analyzers, tokenizers and…

lucene tokenize elasticsearch analyzer stemming

asked Jul 11 '12 at 14:57

Gabbar

4,006
7
41
78

5

votes

1 answer

Lucene.NET stemming problem

I'm running into a problem using the SnowBallAnalyzer in Lucene.NET. It works great for some words, but others it doesn't find any results on at all, and I'm not sure how to dig into this further to find out what is happening. I am testing the…

lucene.net stemming

asked May 31 '11 at 19:03

Timothy Strimple

22,920
6
69
76

5

votes

2 answers

Does stemming and fuzzy search work together in Apache Solr

I am using porter filter factory for a field which has 3 to 4 words in it. Eg : "ABC BLOSSOM COMPANY" I expect to fetch the above document when i search for ABC BLOSSOMING COMPANY as well. When i query this: name:ABC AND name:BLOSSOMING AND…

solr stemming fuzzy porter-stemmer

asked Mar 13 '19 at 11:00

Bhavana67

116
8

5

votes

1 answer

Polish search for Sphinx?

I want to implement a search solution for a website written in Django. From the available options (I have researched Solr, Sphinx, Xapian, PostgreSQL/Tsearch3, MySQL) Sphinx looks like the nicest. However, it does not support stemming for Polish,…

search full-text-search sphinx stemming polish

asked Feb 03 '11 at 19:05

Ryszard Szopa

5,431
8
33
43

Questions tagged [stemming]