Here's the full error traceback if a language is missing from the Open Multilingual WordNet in your nltk_data
directory:
>>> from nltk.corpus import wordnet as wn
>>> wn.synsets('bank')[0].lemma_names('spa')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/nltk/corpus/reader/wordnet.py", line 418, in lemma_names
self._wordnet_corpus_reader._load_lang_data(lang)
File "/usr/local/lib/python2.7/dist-packages/nltk/corpus/reader/wordnet.py", line 1070, in _load_lang_data
f = self._omw_reader.open('{0:}/wn-data-{0:}.tab'.format(lang))
File "/usr/local/lib/python2.7/dist-packages/nltk/corpus/reader/api.py", line 198, in open
stream = self._root.join(file).open(encoding)
File "/usr/local/lib/python2.7/dist-packages/nltk/data.py", line 309, in join
return FileSystemPathPointer(_path)
File "/usr/local/lib/python2.7/dist-packages/nltk/compat.py", line 380, in _decorator
return init_func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/nltk/data.py", line 287, in __init__
raise IOError('No such file or directory: %r' % _path)
IOError: No such file or directory: u'/home/alvas/nltk_data/corpora/omw/spa/wn-data-spa.tab'
So the first thing is to check whether it's installed automatically:
>>> import nltk
>>> nltk.download('omw')
[nltk_data] Downloading package omw to /home/alvas/nltk_data...
[nltk_data] Package omw is already up-to-date!
Tru
Then you should go and check the nltk_data and find that 'spa' folder is missing:
alvas@ubi:~/nltk_data/corpora/omw$ ls
als arb cmn dan eng fas fin fra fre heb ita jpn mcr msa nor pol por README tha
So here's the short term solution:
$ wget http://compling.hss.ntu.edu.sg/omw/wns/spa.zip
$ mkdir ~/nltk_data/corpora/omw/spa
$ unzip -p spa.zip mcr/wn-data-spa.tab > ~/nltk_data/corpora/omw/spa/wn-data-spa.tab
Alternatively, you can simply copy the file from nltk_data/corpora/omw/mcr/wn-data-spa.tab
.
[out]:
>>> from nltk.corpus import wordnet as wn
>>> wn.synsets('bank')[0].lemma_names('spa')
[u'margen', u'orilla', u'vera']
Now the lemma_names()
should work for Spanish, if you're looking for other languages from the Open Multilingusl Wordnet, you can browse here (http://compling.hss.ntu.edu.sg/omw/) and then download and put in the respective nltk_data directory.
The long term solution would be to ask the devs from NLTK and OMW project to update their datasets for their NLTK API.