I need to lemmatize text using nltk. In order to do this, I apply nltk.pos_tag
to each sentence and then convert the resulting Penn Treebank tags (http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html) to WordNet tags. I need to do this because WordNetLemmatizer.lemmatize()
expects both the word and its correct pos_tag as arguments, otherwise it will just assume everything is a verb.
I just found that there are five different tags defined in WordNet:
- wn.VERB
- wn.ADV
- wn.NOUN
- wn.ADJ
- wn.ADJ_SAT
However, every example I found on the internet just ignores wn.ADJ_SAT when converting Treebank tags to WordNet tags. They are all just mapping Penn tags to WordNet tags like this:
- If Penn tag starts with J: convert to wn.ADJ
- If Penn tag starts with V: convert to wn.VERB
- If Penn tag starts with N: convert to wn.NOUN
- If Penn tag starts with R: convert to wn.ADV
So wn.ADJ_SAT is never used.
My question now is if there are cases where the lemmatizer returns a different result for ADJ_SAT than for ADJ. What are examples for words that are satellite adjectives (ADJ_SAT) and no normal adjectives (ADJ)?