Lemmatization in linguistics is the process of grouping together the different inflected forms of a word so they can be analyzed as a single item.
Questions tagged [lemmatization]
436 questions
6
votes
1 answer
WordNetLemmatizer: Different handling of wn.ADJ and wn.ADJ_SAT?
I need to lemmatize text using nltk. In order to do this, I apply nltk.pos_tag to each sentence and then convert the resulting Penn Treebank tags (http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html) to WordNet tags. I need to…

Simon Hessner
- 1,757
- 1
- 22
- 49
6
votes
2 answers
How to lemmatize a list of sentences
How can I lemmatize a list of sentences in Python?
from nltk.stem.wordnet import WordNetLemmatizer
a = ['i like cars', 'cats are the best']
lmtzr = WordNetLemmatizer()
lemmatized = [lmtzr.lemmatize(word) for word in a]
print(lemmatized)
This is…
user9886692
6
votes
3 answers
Simplest method for text lemmatization in Scala and Spark
I want to use lemmatization on a text file:
surprise heard thump opened door small seedy man clasping package wrapped.
upgrading system found review spring 2008 issue moody audio backed.
omg left gotta wrap review order asap . understand hand…

Rozita
- 289
- 1
- 6
- 13
6
votes
1 answer
How to use GermaNet (WordNet German correspondent) with R
I want to use GermaNet for the lemmatization (corresponding to getLemma() in WordNet), of a list (actually DTM terms -- for enhancing text classification performance). But, I couldn't find any hint, or R package for GermaNet. Is it somehow possible…

alex
- 1,103
- 1
- 14
- 25
5
votes
2 answers
Building a lemmatizer: speed optimization
I am building a lemmatizer in python. As I need it to run in realtime/process fairly large amount of data the processing speed
is of the essence.
Data: I have all possible suffixes that are linked to all wordtypes that they can be combined with.…

root
- 76,608
- 25
- 108
- 120
5
votes
1 answer
Efficient Lemmatizer that avoids dictionary lookup
I want to convert string like 'eat' to 'eating', 'eats'. I searched and found the lemmatization as the solution, but all the lemmatizer tools that I have come across uses wordlist or dictionary-lookup. Is there any lemmatizer which avoids dictionary…

ameykpatil
- 581
- 1
- 7
- 21
5
votes
1 answer
Got Argument 'other' has incorrect type (expected spacy.tokens.token.Token, got str)
I was getting the following error while i was trying to read a list in spacy.
TypeError: Argument 'string' has incorrect type (expected spacy.tokens.token.Token, got str)
Here is the code below
f= "MotsVides.txt"
file= open(f, 'r',…

kely789456123
- 605
- 1
- 6
- 21
5
votes
2 answers
How to convert plural nouns to singular using SpaCy?
I am using SpaCy to lemmatize text, but in some special cases I need to keep original text and just convert plural nouns to their singular forms.
Is there a way to tell SpaCy to only convert plural nouns to singulars without lemmatizing the whole…

Nina
- 508
- 4
- 21
5
votes
2 answers
AttributeError: type object 'spacy.syntax.nn_parser.array' has no attribute '__reduce_cython__' , (adding Paths to virtual environments)
Overall problem
I am working on a nlp project and want to use spacy. But when trying to load the language for an nlp object, I keep running into an error:
AttributeError: type object 'spacy.syntax.nn_parser.array' has no attribute…

schedoozle
- 51
- 3
5
votes
3 answers
Lemmatize a doc with spacy?
I have a spaCy doc that I would like to lemmatize.
For example:
import spacy
nlp = spacy.load('en_core_web_lg')
my_str = 'Python is the greatest language in the world'
doc = nlp(my_str)
How can I convert every token in the doc to its lemma?

max
- 4,141
- 5
- 26
- 55
5
votes
1 answer
Lemmatization on CountVectorizer doesn't remove Stopwords
I'm trying to add Lematization to CountVectorizer from Skit-learn,as follows
import nltk
from pattern.es import lemma
from nltk import word_tokenize
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import CountVectorizer
from…

ambigus9
- 1,417
- 3
- 19
- 37
5
votes
1 answer
Detect stopword after lemma in Spacy
How to detect if word is a stopword after stemming and lemmatization in spaCy?
Assume sentence
s = "something good\nsomethings 2 bad"
In this case something is a stopword. Obviously (to me?) Something and somethings are also stopwords, but it needs…

Dawid Laszuk
- 1,773
- 21
- 39
5
votes
1 answer
How to lemmatize strings in pandas dataframes?
I have a Python Pandas dataframe, where I need to lemmatize the words in two of the columns. I am using using spacy for this.
import spacy
nlp = spacy.load("en")
I am trying to use lemmatization based on this example (which works perfectly…

Mia
- 559
- 4
- 9
- 21
5
votes
1 answer
Compute word n-grams on original text or after lemma/stemming process?
I'm thinking about use word n-grams techniques on a raw text. But I have a doubt:
does it have sense use word n-grams after applying lemma/stemming on text? If not, why should I use word n-grams only on raw files? What are pros and cons?

Alessandro
- 742
- 1
- 10
- 34
5
votes
2 answers
Why are there different Lemmatizers in NLTK library?
>> from nltk.stem import WordNetLemmatizer as lm1
>> from nltk import WordNetLemmatizer as lm2
>> from nltk.stem.wordnet import WordNetLemmatizer as lm3
For me all of the three works the same way, but just to confirm, do they provide anything…

Abhishek
- 3,337
- 4
- 32
- 51