15

I'm starting to program with NLTK in Python for Natural Italian Language processing. I've seen some simple examples of the WordNet Library that has a nice set of SynSet that permits you to navigate from a word (for example: "dog") to his synonyms and his antonyms, his hyponyms and hypernyms and so on...

My question is: If I start with an italian word (for example:"cane" - that means "dog") is there a way to navigate between synonyms, antonyms, hyponyms... for the italian word as you do for the english one? Or... There is an Equivalent to WordNet for the Italian Language ?

Thanks in advance

Frank B.
  • 305
  • 1
  • 2
  • 9

2 Answers2

22

You are in luck. The nltk provides an interface to the Open Multilingual Wordnet, which does indeed include Italian among the languages it describes. Just add an argument specifying the desired language to the usual wordnet functions, e.g.:

>>> cane_lemmas = wn.lemmas("cane", lang="ita")
>>> print(cane_lemmas)
[Lemma('dog.n.01.cane'), Lemma('cramp.n.02.cane'), Lemma('hammer.n.01.cane'),
 Lemma('bad_person.n.01.cane'), Lemma('incompetent.n.01.cane')]

The synsets have English names, because they are integrated with the English wordnet. But you can navigate the web of meanings and extract the Italian lemmas for any synset you want:

>>> hypernyms = cane_lemmas[0].synset().hypernyms()
>>> print(hypernyms)
[Synset('canine.n.02'), Synset('domestic_animal.n.01')]
>>> print(hypernyms[1].lemmas(lang="ita"))
[Lemma('domestic_animal.n.01.animale_addomesticato'), 
 Lemma('domestic_animal.n.01.animale_domestico')]

Or since you mentioned "cattiva_persona" in the comments:

>>> wn.lemmas("bad_person")[0].synset().lemmas(lang="ita")
[Lemma('bad_person.n.01.cane'), Lemma('bad_person.n.01.cattivo')]

I went from the English lemma to the language-independent synset to the Italian lemmas.

alexis
  • 48,685
  • 16
  • 101
  • 161
  • Yes, thanks for the advice but as you can see, it doesn't give me the italian Synonyms but it seems that with "wn.lemmas("cane",lang="ita")" just translate the italian word in english and returns what it found. To be more explicit: I'm expecting that instead of Lemma('bad_berson') it should get back Lemma('cattiva_persona') that is the italian meaning of bad_person. – Frank B. May 11 '17 at 14:32
  • I don't speak Italian, but Wordnet is saying that one of the meanings of "cane" is "bad person". Wordnet is built on English, so the meanings have English names but multilingual lemmas. Think of the English side as variable names -- it doesn't matter what they are, what matters is the data they contain. I'll add an example of getting hypernyms. – alexis May 11 '17 at 20:10
  • @FrankB.I am working on a similar task, were you able to find a solution ? – tahsintahsin Mar 16 '21 at 09:43
  • @tahsintahsin study the answer (edited since Frank's question), it's all explained there. This _is_ the solution. – alexis Mar 16 '21 at 12:37
  • @alexis: "cane" doesn't mean "bad person": "cane" means "dog". You can use "dog" to say that a person can't do something very well, eg. when you say that a musician is a dog you're saying he is a bad musician. When you say a writer is a dog you're saying he can't really write: he's a bad writer. BUT not a bad person. – Life after Guest Mar 30 '22 at 16:58
  • @Life, thanks for the explanation; but I didn't write Wordnet, I just described what it contains. Clearly it includes metaphorical as well as literal meanings (as long as they are conventional, I suppose). And somebody decided that the meaning "bad person" is sufficiently well expressed by "cane", regardless of additional meanings this word can (metaphorically) express. If this is truly incorrect (I have no opinion since I do not speak Italian), then the [Open Multilingual Wordnet](http://compling.hss.ntu.edu.sg/omw/) would be the place to try to correct it. – alexis Mar 30 '22 at 20:09
  • @alexis Good suggestion :-) – Life after Guest Apr 06 '22 at 14:11
  • Is it also possible to lemmatize italian words with nltk? – sound wave Feb 07 '23 at 15:49
  • Not with wordnet ¯\_(ツ)_/¯ – alexis Feb 15 '23 at 08:35
  • Hi. I’m new to this so this might sound silly. I found [this](https://omwn.org/omw1.html). How do I download and use these language databases with nltk wordnet? There is an Italian db in the link above. Or are they included already? Many thanks. – Lam Le Apr 12 '23 at 19:28
  • 1
    @LamLe don't bother with the original OMW distribution, use the version that the nltk bundles. You'll find it in the interactive `nltk.download()` dialog, for example. Or just use `nltk.download("omw")` to download it. You only need to do this once, don't put it in your scripts! – alexis Apr 21 '23 at 10:46
  • I see! Thank you very much for coming back to this old answer to reply to me :D – Lam Le Apr 22 '23 at 11:15
8

Since I found myself wondering how to actually use the wordnet resources after reading this question and its answer, I'm going to leave here some useful information:

Here is a link to the nltk guide.

The two necessary commands to download wordnet data and thus proceed with the usage explained in the other answer are:

import nltk

nltk.download('wordnet')
nltk.download('omw')
Nicolò Gasparini
  • 2,228
  • 2
  • 24
  • 53
  • I got a LookupError saying "Resource omw-1.4 not found." and suggesting to run `nltk.download('omw-1.4')`, after this it worked – sound wave Feb 07 '23 at 15:37