Questions tagged [lemmatization]

Lemmatization in linguistics is the process of grouping together the different inflected forms of a word so they can be analyzed as a single item.

436 questions
6
votes
1 answer

WordNetLemmatizer: Different handling of wn.ADJ and wn.ADJ_SAT?

I need to lemmatize text using nltk. In order to do this, I apply nltk.pos_tag to each sentence and then convert the resulting Penn Treebank tags (http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html) to WordNet tags. I need to…
Simon Hessner
  • 1,757
  • 1
  • 22
  • 49
6
votes
2 answers

How to lemmatize a list of sentences

How can I lemmatize a list of sentences in Python? from nltk.stem.wordnet import WordNetLemmatizer a = ['i like cars', 'cats are the best'] lmtzr = WordNetLemmatizer() lemmatized = [lmtzr.lemmatize(word) for word in a] print(lemmatized) This is…
user9886692
6
votes
3 answers

Simplest method for text lemmatization in Scala and Spark

I want to use lemmatization on a text file: surprise heard thump opened door small seedy man clasping package wrapped. upgrading system found review spring 2008 issue moody audio backed. omg left gotta wrap review order asap . understand hand…
Rozita
  • 289
  • 1
  • 6
  • 13
6
votes
1 answer

How to use GermaNet (WordNet German correspondent) with R

I want to use GermaNet for the lemmatization (corresponding to getLemma() in WordNet), of a list (actually DTM terms -- for enhancing text classification performance). But, I couldn't find any hint, or R package for GermaNet. Is it somehow possible…
alex
  • 1,103
  • 1
  • 14
  • 25
5
votes
2 answers

Building a lemmatizer: speed optimization

I am building a lemmatizer in python. As I need it to run in realtime/process fairly large amount of data the processing speed is of the essence. Data: I have all possible suffixes that are linked to all wordtypes that they can be combined with.…
root
  • 76,608
  • 25
  • 108
  • 120
5
votes
1 answer

Efficient Lemmatizer that avoids dictionary lookup

I want to convert string like 'eat' to 'eating', 'eats'. I searched and found the lemmatization as the solution, but all the lemmatizer tools that I have come across uses wordlist or dictionary-lookup. Is there any lemmatizer which avoids dictionary…
ameykpatil
  • 581
  • 1
  • 7
  • 21
5
votes
1 answer

Got Argument 'other' has incorrect type (expected spacy.tokens.token.Token, got str)

I was getting the following error while i was trying to read a list in spacy. TypeError: Argument 'string' has incorrect type (expected spacy.tokens.token.Token, got str) Here is the code below f= "MotsVides.txt" file= open(f, 'r',…
kely789456123
  • 605
  • 1
  • 6
  • 21
5
votes
2 answers

How to convert plural nouns to singular using SpaCy?

I am using SpaCy to lemmatize text, but in some special cases I need to keep original text and just convert plural nouns to their singular forms. Is there a way to tell SpaCy to only convert plural nouns to singulars without lemmatizing the whole…
Nina
  • 508
  • 4
  • 21
5
votes
2 answers

AttributeError: type object 'spacy.syntax.nn_parser.array' has no attribute '__reduce_cython__' , (adding Paths to virtual environments)

Overall problem I am working on a nlp project and want to use spacy. But when trying to load the language for an nlp object, I keep running into an error: AttributeError: type object 'spacy.syntax.nn_parser.array' has no attribute…
schedoozle
  • 51
  • 3
5
votes
3 answers

Lemmatize a doc with spacy?

I have a spaCy doc that I would like to lemmatize. For example: import spacy nlp = spacy.load('en_core_web_lg') my_str = 'Python is the greatest language in the world' doc = nlp(my_str) How can I convert every token in the doc to its lemma?
max
  • 4,141
  • 5
  • 26
  • 55
5
votes
1 answer

Lemmatization on CountVectorizer doesn't remove Stopwords

I'm trying to add Lematization to CountVectorizer from Skit-learn,as follows import nltk from pattern.es import lemma from nltk import word_tokenize from nltk.corpus import stopwords from sklearn.feature_extraction.text import CountVectorizer from…
ambigus9
  • 1,417
  • 3
  • 19
  • 37
5
votes
1 answer

Detect stopword after lemma in Spacy

How to detect if word is a stopword after stemming and lemmatization in spaCy? Assume sentence s = "something good\nsomethings 2 bad" In this case something is a stopword. Obviously (to me?) Something and somethings are also stopwords, but it needs…
Dawid Laszuk
  • 1,773
  • 21
  • 39
5
votes
1 answer

How to lemmatize strings in pandas dataframes?

I have a Python Pandas dataframe, where I need to lemmatize the words in two of the columns. I am using using spacy for this. import spacy nlp = spacy.load("en") I am trying to use lemmatization based on this example (which works perfectly…
Mia
  • 559
  • 4
  • 9
  • 21
5
votes
1 answer

Compute word n-grams on original text or after lemma/stemming process?

I'm thinking about use word n-grams techniques on a raw text. But I have a doubt: does it have sense use word n-grams after applying lemma/stemming on text? If not, why should I use word n-grams only on raw files? What are pros and cons?
5
votes
2 answers

Why are there different Lemmatizers in NLTK library?

>> from nltk.stem import WordNetLemmatizer as lm1 >> from nltk import WordNetLemmatizer as lm2 >> from nltk.stem.wordnet import WordNetLemmatizer as lm3 For me all of the three works the same way, but just to confirm, do they provide anything…
Abhishek
  • 3,337
  • 4
  • 32
  • 51
1 2
3
29 30