Why are there different Lemmatizers in NLTK library?

Question

>> from nltk.stem import WordNetLemmatizer as lm1
>> from nltk import WordNetLemmatizer as lm2
>> from nltk.stem.wordnet import WordNetLemmatizer as lm3

For me all of the three works the same way, but just to confirm, do they provide anything different?

score 5 · Accepted Answer · edited May 23 '17 at 10:32

5

No they're not different they're all the same.

from nltk.stem import WordNetLemmatizer as lm1
from nltk import WordNetLemmatizer as lm2
from nltk.stem.wordnet import WordNetLemmatizer as lm3

lm1 == lm2 
>>> True


lm2 == lm3 
>>> True


lm1 == lm3 
>>> True

As corrected by erip why this is happening is because :

That Class(WordNetLemmatizer) is origanlly written in nltk.stem.wordnet so you can do from nltk.stem.wordnet import WordNetLemmatizer as lm3

Which is also import in nltk __init__.py file so you can do from nltk import WordNetLemmatizer as lm2

And is also imported in __init__.py nltk.stem module so you can do from nltk.stem import WordNetLemmatizer as lm1

edited May 23 '17 at 10:32

Community

1
1

answered Nov 09 '16 at 19:27

harshil9968

3,254
1
16
26

3

Your last point is incorrect. NLTK uses `__init__.py` to hide this. Has nothing to do with the efficiency of the language's importing mechanism. See [here](https://github.com/nltk/nltk/blob/develop/nltk/__init__.py#L137), [here](https://github.com/nltk/nltk/blob/develop/nltk/stem/__init__.py#L30), and [here](https://github.com/nltk/nltk/blob/develop/nltk/stem/wordnet.py#L15). – erip Nov 09 '16 at 19:29

alvas · Answer 2 · 2016-11-10T01:14:31.530

Answer: They are all the same.

inspect helpful tool to check whether objects are the same

>>> import inspect
>>> from nltk.stem import WordNetLemmatizer as wnl1
>>> from nltk.stem.wordnet import WordNetLemmatizer as wnl2
>>> inspect.getfile(wnl1)
'/Library/Python/2.7/site-packages/nltk/stem/wordnet.pyc'
# They come from the same file:
>>> inspect.getfile(wnl1) == inspect.getfile(wnl2)
True
>>> print inspect.getdoc(wnl1)
WordNet Lemmatizer

Lemmatize using WordNet's built-in morphy function.
Returns the input word unchanged if it cannot be found in WordNet.

    >>> from nltk.stem import WordNetLemmatizer
    >>> wnl = WordNetLemmatizer()
    >>> print(wnl.lemmatize('dogs'))
    dog
    >>> print(wnl.lemmatize('churches'))
    church
    >>> print(wnl.lemmatize('aardwolves'))
    aardwolf
    >>> print(wnl.lemmatize('abaci'))
    abacus
    >>> print(wnl.lemmatize('hardrock'))
    hardrock

You can check the source code too:

>>> print inspect.getsource(wnl1)
class WordNetLemmatizer(object):
    """
    WordNet Lemmatizer

    Lemmatize using WordNet's built-in morphy function.
    Returns the input word unchanged if it cannot be found in WordNet.

        >>> from nltk.stem import WordNetLemmatizer
        >>> wnl = WordNetLemmatizer()
        >>> print(wnl.lemmatize('dogs'))
        dog
        >>> print(wnl.lemmatize('churches'))
        church
        >>> print(wnl.lemmatize('aardwolves'))
        aardwolf
        >>> print(wnl.lemmatize('abaci'))
        abacus
        >>> print(wnl.lemmatize('hardrock'))
        hardrock
    """

    def __init__(self):
        pass

    def lemmatize(self, word, pos=NOUN):
        lemmas = wordnet._morphy(word, pos)
        return min(lemmas, key=len) if lemmas else word

    def __repr__(self):
        return '<WordNetLemmatizer>'

# They have the same source code too:
>>> print inspect.getsource(wnl1) == inspect.getsource(wnl2)
True

The structure of the imports in NLTK for the WordNetLemmatizer looks like this:

\nltk
    __init__.py
    \stem.
        __init__.py  
        wordnet.py     # This is where WordNetLemmatizer code resides.

We start from the lowest where WordNetLemmatizer resides in nltk.stem.wordnet.py https://github.com/nltk/nltk/blob/develop/nltk/stem/wordnet.py#L15, so you can do:

from nltk.stem.wordnet import WordNetLemmatizer

From nltk.stem.init.py, we see the above import at https://github.com/nltk/nltk/blob/develop/nltk/stem/init.py#L30 that allows nltk.stem to access WordNetLemmatizer, so that you can do

from nltk.stem import WordNetLemmatizer

From nltk.__init__.py we see:

from nltk.stem import *

That allows the topmost level nltk import to access everything that nltk.stem have access to. So at the top level nltk, we can do:

from nltk import WordNetLemmatizer

One thing to note though, it's NOT always the case that the objects/modules with the same name refers to the same object in NLTK, e.g.:

>>> from nltk.corpus import wordnet as wn1
>>> from nltk.corpus.reader import wordnet as wn2
>>> wn1 == wn2
False

>>> wn1.synsets('dog')
[Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'), Synset('cad.n.01'), Synset('frank.n.02'), Synset('pawl.n.01'), Synset('andiron.n.01'), Synset('chase.v.01')]

>>> wn2.synsets('dog')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'synsets'

The first wordnet wn1 is a LazyCorpusLoader object that will open the wordnet files in nltk_data and it allows you to access the synsets: https://github.com/nltk/nltk/blob/develop/nltk/corpus/init.py#L246

The second wn2 is the wordnet.py file itself that resides in nltk.corpus.wordnet.py: https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py

It gets even more tricky when:

>>> from nltk.corpus import wordnet as wn1
>>> from nltk.corpus.reader import wordnet as wn2
>>> from nltk.stem import wordnet as wn3
>>> wn3 == wn1
False
>>> wn3 == wn2
False

In the case of wn3, it is referring to the file nltk.stem.wordnet.py that contains the WordNetLemmatizer and it has nothing to do with the wordnet corpus object or corpus reader for wordnet.

Why are there different Lemmatizers in NLTK library?

2 Answers2