6

I need to lemmatize text using nltk. In order to do this, I apply nltk.pos_tag to each sentence and then convert the resulting Penn Treebank tags (http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html) to WordNet tags. I need to do this because WordNetLemmatizer.lemmatize() expects both the word and its correct pos_tag as arguments, otherwise it will just assume everything is a verb.

I just found that there are five different tags defined in WordNet:

  • wn.VERB
  • wn.ADV
  • wn.NOUN
  • wn.ADJ
  • wn.ADJ_SAT

However, every example I found on the internet just ignores wn.ADJ_SAT when converting Treebank tags to WordNet tags. They are all just mapping Penn tags to WordNet tags like this:

  • If Penn tag starts with J: convert to wn.ADJ
  • If Penn tag starts with V: convert to wn.VERB
  • If Penn tag starts with N: convert to wn.NOUN
  • If Penn tag starts with R: convert to wn.ADV

So wn.ADJ_SAT is never used.

My question now is if there are cases where the lemmatizer returns a different result for ADJ_SAT than for ADJ. What are examples for words that are satellite adjectives (ADJ_SAT) and no normal adjectives (ADJ)?

Simon Hessner
  • 1,757
  • 1
  • 22
  • 49
  • Possible duplicate of [What part of speech does "s" stand for in WordNet synsets](https://stackoverflow.com/questions/18817396/what-part-of-speech-does-s-stand-for-in-wordnet-synsets) – John Strood Sep 21 '18 at 10:34

1 Answers1

2

The WordNetLemmatizer in NLTK does not differentiate satellite adjectives from normal adjectives.

nltk.stem.WordNetLemmatizer.lemmatize is uses "WordNet’s built-in morphy function. Returns the input word unchanged if it cannot be found in WordNet."

In WordNet, a satellite adjective--more broadly referred to as a satellite synset--is more of a semantic label used elsewhere in WordNet than a special part-of-speech in nltk.

From the wordnet glossary:

Satellite Synset: Synset in an adjective cluster representing a concept that is similar in meaning to the concept represented by its head synset .

User tripleee points out in this question the following:

adjectives are subcategorized into 'head' and 'satellite' synsets within an adjective clutser

Also, the nltk documentation for nltk.stem.WordNetLemmatizer.lemmatize assumes the default part of speech to be a noun instead of a verb, unless otherwise specified.

matt_07734
  • 347
  • 2
  • 13