How to get the WordNet synset given an offset ID?

Question

I have a WordNet synset offset (for example id="n#05576222"). Given this offset, how can I get the synset using Python?

score 29 · Answer 1 · edited Apr 07 '20 at 06:07

29

As of NLTK 3.2.3, there's a public method for doing this:

wordnet.synset_from_pos_and_offset(pos, offset)

In earlier versions you can use:

wordnet._synset_from_pos_and_offset(pos, offset)

This returns a synset based on it's POS and offest ID. I think this method is only available in NLTK 3.0 but I'm not sure.

Example:

from nltk.corpus import wordnet as wn
wn.synset_from_pos_and_offset('n',4543158)
>> Synset('wagon.n.01')

edited Apr 07 '20 at 06:07

MBT

21,733
19
84
102

answered Nov 26 '14 at 09:37

donners45

361
3
4

1

This solution requires a pos tag, while Suzana's doesn't. Can someone explain why the pos tag is necessary for wn.synset_from_pos_and_offset() ? – Daniel Loureiro Nov 09 '18 at 19:05
I'm curious about the above as well! – vsocrates May 29 '22 at 02:48

Suzana · Answer 2 · 2018-01-11T09:52:33.527

14

For NTLK 3.2.3 or newer, please see donners45's answer.

For older versions of NLTK:

There is no built-in method in the NLTK but you could use this:

from nltk.corpus import wordnet

syns = list(wordnet.all_synsets())
offsets_list = [(s.offset(), s) for s in syns]
offsets_dict = dict(offsets_list)

offsets_dict[14204095]
>>> Synset('heatstroke.n.01')

You can then pickle the dictionary and load it whenever you need it.

For NLTK versions prior to 3.0, replace the line

offsets_list = [(s.offset(), s) for s in syns]

with

offsets_list = [(s.offset, s) for s in syns]

since prior to NLTK 3.0 offset was an attribute instead of a method.

edited Jan 11 '18 at 09:52

answered Sep 11 '12 at 21:53

Suzana

4,251
2
28
52

7

`offset` is now a method. Try this instead: `offsets_dict = {s.offset(): s for s in wn.all_synsets()}` – Omer Jan 27 '16 at 09:25
*"There is no built-in method in the NLTK"* - there is now! See donners45's answer; this one is obsolete. – Mark Amery Jan 07 '18 at 16:33

score 7 · Answer 3 · edited Jan 07 '18 at 16:29

7

You can use of2ss(), For example:

from nltk.corpus import wordnet as wn
syn = wn.of2ss('01580050a')

will return Synset('necessary.a.01')

edited Jan 07 '18 at 16:29

Mark Amery

143,130
81
406
459

answered Mar 20 '17 at 14:36

carcar

81
1
2

alvas · Answer 4 · 2015-07-09T01:22:17.777

Other than using NLTK, another option would be to use the .tab file from the Open Multilingual WordNet http://compling.hss.ntu.edu.sg/omw/ for the Princeton WordNet. Normally i used the recipe below to access wordnet as a dictionary with offset as the key and ; delimited strings as a values:

# Gets first instance of matching key given a value and a dictionary.    
def getKey(dic, value):
  return [k for k,v.split(";") in dic.items() if v in value]

# Read Open Multi WN's .tab file
def readWNfile(wnfile, option="ss"):
  reader = codecs.open(wnfile, "r", "utf8").readlines()
  wn = {}
  for l in reader:
    if l[0] == "#": continue
    if option=="ss":
      k = l.split("\t")[0] #ss as key
      v = l.split("\t")[2][:-1] #word
    else:
      v = l.split("\t")[0] #ss as value
      k = l.split("\t")[2][:-1] #word as key
    try:
      temp = wn[k]
      wn[k] = temp + ";" + v
    except KeyError:
      wn[k] = v  
  return wn

princetonWN = readWNfile('wn-data-eng.tab')
offset = "n#05576222"
offset = offset.split('#')[1]+'-'+ offset.split('#')[0]

print princetonWN.split(";")
print getKey('heatstroke')

How to get the WordNet synset given an offset ID?

4 Answers4

Linked