How to get the princeton WN sense id given a sense offset? Python-NLTK

Question

I can get the sense offset from a princeton WN sense as marked in the NTLK corpus library:

[in]:'dog.n.01'
>>> from nltk.corpus import wordnet as wn
>>> ss = wn.synset('dog.n.01')
>>> offset = str(ss.offset).zfill(8)+"-"+ss.pos
>>> print offset
[out]:'02084071-n'

That offset is similar to the convention used in http://casta-net.jp/~kuribayashi/cgi-bin/wn-multi.cgi?synset=02084071-n&lang=eng

How can i do the reverse without looping through the whole wordnet corpus? where:

[in]: '02084071-n'
[out]: 'dog.n.01' or Synset('dog.n.01')

I could do this but it's just way way too long and too many redundant cycles:

[in]: '02084071-n'
in_offset, in_pos = "02084071-n".split("-")
from nltk.corpus import wordnet as wn
nltk_ss = [i for i in wn.all_synsets() if i.offset == int(in_offset) and i.pos == in_pos][0]
print nltk_ss
[out]: Synset('dog.n.01')

This question was already answered [here](http://stackoverflow.com/questions/8077641/wordnet-synset-offset/12378481#12378481). — Suzana, Mar 14 '13 at 12:02
thanks, it seems like the only way is either to read from the open WN or look through a whole set of synset — alvas, Mar 14 '13 at 15:04

score 3 · Accepted Answer · answered Mar 14 '13 at 08:05

3

Unfortunately, you cannot reverse lookup without iterating over the corpus at least once (like you have shown). The only thing I can suggest would be to keep it in a dictionary if you are going to be looking up synsets based on offsets multiple times.

>>> senseIdToSynset = {s.offset:s for s in wn.all_synsets()}
>>> senseIdToSynset[2084071]
Synset('dog.n.01')

answered Mar 14 '13 at 08:05

Jared

25,627
7
56
61

really? only by looping through the whole wordnet?! – alvas Mar 14 '13 at 10:54

How to get the princeton WN sense id given a sense offset? Python-NLTK

1 Answers1