Highest Voted 'lemmatization' Questions

10

votes

4 answers

How to solve Spanish lemmatization problems with SpaCy?

When trying lemmatize in Spanish a csv with more than 60,000 words, SpaCy does not correctly write certain words, I understand that the model is not 100% accurate. However, I have not found any other solution, since NLTK does not bring a Spanish…

python spacy lemmatization

asked Mar 04 '20 at 21:30

Y4RD13

937
1
16
42

10

votes

2 answers

Lemmatization with apache lucene

I'm developing a text analysis project using apache lucene. I need to lemmatize some text (transform the words to their canonical forms). I've already written the code that makes stemming. Using it, I am able to convert the following sentence The…

java lucene nlp stemming lemmatization

asked Dec 09 '17 at 03:29

Kirill Simonov

8,257
3
18
42

9

votes

2 answers

Lemmatizing Italian sentences for frequency counting

I would like to lemmatize some Italian text in order to perform some frequency counting of words and further investigations on the output of this lemmatized content. I am preferring lemmatizing than stemming because I could extract the word meaning…

python-2.7 nlp nltk stemming lemmatization

asked Jul 30 '17 at 18:41

TPPZ

4,447
10
61
106

8

votes

1 answer

How to inverse lemmatization process given a lemma and a token?

Generally, in natural language processing, we want to get the lemma of a token. For example, we can map 'eaten' to 'eat' using wordnet lemmatization. Is there any tools in python that can inverse lemma to a certain form? For example, we map 'go' to…

python nlp nltk lemmatization

asked Aug 09 '17 at 12:08

Shifeng.Liu

105
2
7

8

votes

2 answers

Lemmatization of non-English words?

I would like to apply lemmatization to reduce the inflectional forms of words. I know that for English language WordNet provides such a functionality, but I am also interested in applying lemmatization for Dutch, French, Spanish and Italian words.…

python nltk information-retrieval information-extraction lemmatization

asked Mar 03 '14 at 10:31

Crista23

3,203
9
47
60

7

votes

1 answer

Wordpiece tokenization versus conventional lemmatization?

I'm looking at NLP preprocessing. At some point I want to implement a context-sensitive word embedding, as a way of discerning word sense, and I was thinking about using the output from BERT to do so. I noticed BERT uses WordPiece tokenization (for…

nlp tokenize lemmatization

asked Jul 16 '19 at 13:07

Keshinko

318
1
11

7

votes

2 answers

Analyze text (lemmatization, edit distance)

I need to analyze the text to exist in it banned words. Suppose the black list is the word: "Forbid". The word has many forms. In the text the word can be, for example: "forbidding", "forbidden", "forbad". To bring the word to the initial form, I…

c# nlp similarity lemmatization

asked Apr 03 '11 at 12:45

user348173

8,818
18
66
102

7

votes

1 answer

Solr/Lucene query lemmatization with context

I have successfully implemented a Czech lemmatizer for Lucene. I'm testing it with Solr and it woks nice at the index time. But it doesn't work so well when used for queries, because the query parser doesn't provide any context (words before or…

solr lucene lemmatization word-sense-disambiguation query-parser

asked Oct 04 '16 at 10:13

dedek

7,981
3
38
68

7

votes

1 answer

Getting the root word using the Wordnet Lemmatizer

I need to find a common root word matched for all related words for a keyword extractor. How to convert words into the same root using the python nltk lemmatizer? Eg: generalized, generalization -> general optimal, optimized -> optimize…

python nlp nltk wordnet lemmatization

asked Sep 03 '16 at 03:10

Shanika Ediriweera

1,975
2
24
31

7

votes

1 answer

Faster Lemmatization techniques in Python

I am trying to find out a faster way to lemmatize words in a list (named text) using the NLTK Word Net Lemmatizer. Apparently this is the most time consuming step in my whole program(used cProfiler to find the same). Following is the piece of code…

python performance python-3.x nltk lemmatization

asked Jun 24 '16 at 18:21

Shivansh Singh

81
1
7

7

votes

1 answer

Why NLTK lemmatization has wrong output even if verb.exc has added right value?

When I open verb.exc, I can see saw see While I use lemmatization in code >>>print lmtzr.lemmatize('saw', 'v') saw How can this happen? Do I misunderstand in revising wordNet?

python nlp nltk wordnet lemmatization

asked Nov 08 '15 at 13:55

Leo Hsieh

351
4
12

7

votes

1 answer

Stemming unstructured text in NLTK

I tried the regex stemmer, but I get hundreds of unrelated tokens. I'm just interested in the "play" stem. Here is the code I'm working with: import nltk from nltk.book import * f = open('tupac_original.txt', 'rU') text = f.read() text1 =…

nltk tokenize text-analysis lemmatization

asked Sep 26 '13 at 18:49

user2221429

71
1
4

7

votes

1 answer

Looking for a database or text file of english words with their different forms

I am working on a project and I need to get the root of a given word (stemming). As you know, the stemming algorithms that don't use a dictionary are not accurate. Also I tried the WordNet but it is not good for my project. I found phpmorphy project…

nlp stemming lemmatization

asked Aug 21 '13 at 19:31

Majid Darabi

731
6
15

6

votes

2 answers

Ho to do lemmatization on German text?

I have a German text that I want to apply lemmatization to. If lemmatization is not possible, then I can live with stemming too. Data: This is my German text: mails=['Hallo. Ich spielte am frühen Morgen und ging dann zu einem Freund. Auf…

nlp spacy lemmatization

asked Sep 09 '19 at 15:43

PParker

1,419
2
10
25

6

votes

1 answer

Does keras-tokenizer perform the task of lemmatization and stemming?

Does keras tokenizer provide the functions such as stemming and lemmetization? If it does, then how is it done? Need an intuitive understanding. Also, what does text_to_sequence do in that?

keras nlp tokenize stemming lemmatization

asked Jun 12 '19 at 07:33

ASingh

133
1
4

Questions tagged [lemmatization]