3

Extension to the use case here - NLTK words lemmatizing

I have nltk installed on my computer(with all modules & corpus from the book). My use case is to explore and contrast some lemmatization and stemming approaches for my dataset (I tried Porter lemmatization, which worked)

I was trying to use the lemmatization with Wordnet as described by @Chthonic Project here NLTK words lemmatizing . However the source code it points to(see here http://nltk.org/_modules/nltk/app/wordnet_app.html) , needs compat module from nltk.

from nltk import compat
ImportError: cannot import name compat

I googled around for the import error of compat(and it looked like compatibilty?) and here's what I tried on my ubuntu box:-

sudo find . -name compat* which returns the files below . I also tried sudo find -name "trac" -type d which returns nothing .

I see that I should have found some modules with "trac/tests/functional/fixes" in a likewise folder /usr/lib/python2.4/site-packages/Trac-0.11.1-py2.4.egg/trac/tests/functional/

Source : http://biodegradablegeek.com/2008/08/workaround-for-importerror-cannot-import-name-compat-issue-in-trac-011x/#sthash.NhAThk6e.dpuf

Questions :

1. What am I missing ? And is this an issue with trac/tests?

2. Is there a way to be able to use wordnet for lemmatization (from nltk.corpus import wordnet as wn works just fine. Post the import error is solved, how do I use this module http://nltk.org/_modules/nltk/app/wordnet_app.html (I was trying to build the source locally from this page, i.e. is the file browserver.py, when I hit the import error with compat)

Tip : If you are providing a solution, please also mention how to solve this on my windows environment (I use both windows & ubuntu interchangeably,depending on context)

Files I see from find . -name compat*

ekta@ekta-VirtualBox:/usr/lib/python2.7$ sudo find . -name compat*
./dist-packages/numpy/numarray/compat.pyc
./dist-packages/numpy/numarray/compat.py
./dist-packages/numpy/distutils/compat.pyc
./dist-packages/numpy/distutils/compat.py
./dist-packages/numpy/compat
./dist-packages/numpy/oldnumeric/compat.pyc
./dist-packages/numpy/oldnumeric/compat.py
./dist-packages/twisted/python/compat.pyc
./dist-packages/twisted/python/compat.py
./dist-packages/gtk-2.0/gtk/compat.pyc
./dist-packages/gtk-2.0/gtk/compat.py

I am on python 2.7

Community
  • 1
  • 1
ekta
  • 1,560
  • 3
  • 28
  • 57

1 Answers1

3

Lemmatizing using WordNet (Morphy, actually) in NLTK is simple:

from nltk.corpus import wordnet as wn

wn.morphy('runs') # "run"
wn.morphy('leaves') # "leaf"

wordnet_app is a WordNet browser, not the NLTK WordNet API: you don't need it! Chthonic Project was talking about derivationally related forms, not lemmatizing, which are two different things.

By the way, the issue you had with wordnet_app and compat is that you copied a recent version of the file which was incompatible with your nltk distribution (compat is a recent NLTK module inspired from six that helps the transition to Python 3.). If you need wordnet_app, don't copy the source, simply use the version in your NLTK distribution!)

Quentin Pradet
  • 4,691
  • 2
  • 29
  • 41
  • Follow up question 1) if I were to use derivationally related forms - will I get a single source like "Analysis", "Analyst", "Analyze" all have the same "stem" (or is that a different question all together? 2. How do I use the wordnet_app for python 2.7 - could you give a simple example ? 3. wn.morphy('analysis'), wn.morphy('analyst'),wn.morphy('analyze') - don't solve my problem here. I need the "stem"(and was assuming Lemmatization should give me that?) – ekta Aug 26 '13 at 11:29
  • 1
    re. Q1: I believe that derivationally related forms are more of a network than a tree - they don't all come down to one single source word but rather they're related words (even if they have a common lexical root that may not be a word by itself). re. Q3.: for clarification on stemming vs. lemmatizing (morphy() lemmatizes), take a look at http://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html, especially the paragraph starting with "However, the two words differ in their flavor. Stemming usually refers to...". Given that, please clarify your question! :) – arturomp Aug 26 '13 at 12:05