2

I am new to Imagenet and Wordnet database. I am trying to re-classify the images and categories of Imagenet more roughly (e.g. 'plant', 'fish', 'people', ...).

I understand that the images can be downloaded at http://www.image-net.org/synset?wnid=[wnid], and this file maps the synset ID to the corresponding noun(s), but are there any rules governing the IDs (e.g. does each number of the ID mean some category or sub-category?).

hikaru
  • 617
  • 6
  • 18

2 Answers2

7

As the official API documentation says, wnid is the identification in ImageNet not nltk. You can map word to wnid according to the Mapping between ImageNet and WordNet in the API documentation.

To uniquely identify a synset, we use "WordNet ID" (wnid), which is a concatenation of POS ( i.e. part of speech ) and SYNSET OFFSET of WordNet.

Firstly, get synsets and offsets in nltk:

from nltk.corpus import wordnet as wn

plant_list = wn.synsets('plant')
# plant_list is: [Synset('plant.n.01'), Synset('plant.n.02'), Synset('plant.n.03'), Synset('plant.n.04'), Synset('plant.v.01'), Synset('implant.v.01'), Synset('establish.v.02'), Synset('plant.v.04'), Synset('plant.v.05'), Synset('plant.v.06')]

offset = plant_list[0].offset()

Secondly, concatenate the POS and offset

As the ImageNet only consider nouns, just pick the noun synsets in plant_list and concatenate wnid = "n{:08d}".format(offset) to get the wnid.

Because there is a list of synsets, you will get several wnid for 'plant'.

Jiang
  • 86
  • 1
  • 3
3

As of March 11 2021, Imagenet has stated publicly:

The new website is simpler; we removed tangential or outdated functions to focus on the core use case—enabling users to download the data, including the full ImageNet dataset and the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). Source

That means that any services using the so-called "official documentation" to parse and search imagenet now need to use nltk (contra to the poster above answer).

This was confirmed only after I submitted a helpdesk ticket after my service started returning all 404s as follows:

Begin forwarded message:

From: ImageNet Support <imagenet.help.desk@gmail.com>
Subject: Re: wordnet api
Date: March 16, 2021 at 11:21:37 AM EDT
To: Aaron Soellinger <me@me>

Unfortunately we have updated the website and do not maintain these APIs any more. Any URLs from the old website may become invalid if they are not on the new website. For your use case, a workaround may be to query the WordNet hierarchy, e.g., by using the WordNet NLTK interface. 

On Tue, Mar 16, 2021 at 11:18 AM Aaron Soellinger <me@me> wrote:
below:

ss = 'http://www.image-net.org/synset?wnid={wnid}'
hyp = 'http://www.image-net.org/api/text/wordnet.structure.hyponym?wnid={wnid}'
word = 'http://www.image-net.org/api/text/wordnet.synset.getwords?wnid={wnid}'
mapg = 'http://www.image-net.org/api/text/imagenet.synset.geturls.getmapping?wnid={wnid}'
urlf = 'http://www.image-net.org/api/text/imagenet.synset.geturls?wnid={wnid}'

On Mar 16, 2021, at 11:17 AM, ImageNet Support <imagenet.help.desk@gmail.com> wrote:

Hello Aaron,

What is the URL of the API?

Best, 

On Tue, Mar 16, 2021 at 8:15 AM Aaron Soellinger <me> wrote:
Hi,

I was using the wordnet api found at image-net.org/api ..  I have noticed that it no longer works.  All my links return 404s. 

Can you help?

—aaron

So, ya, nltk it is.

Aaron Soellinger
  • 329
  • 2
  • 10