15

I have saved a Gensim dictionary to disk. When I load it, the id2token attribute dict is not populated.

A simple piece of the code that saves the dictionary:

dictionary = corpora.Dictionary(tag_docs)
dictionary.save("tag_dictionary_lda.pkl")

Now when I load it (I'm loading it in an jupyter notebook), it still works fine for mapping tokens to IDs, but id2token does not work (I cannot map IDs to tokens) and in fact id2token is not populated at all.

> dictionary = corpora.Dictionary.load("../data/tag_dictionary_lda.pkl")
> dictionary.token2id["love"]
Out: 1613

> dictionary.doc2bow(["love"])
Out: [(1613, 1)]

> dictionary.id2token[1613]
Out: 
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input> in <module>()
----> 1 dictionary.id2token[1613]

KeyError: 1613

> list(dictionary.id2token.keys())
Out: []

Any thoughts?

cjrieds
  • 827
  • 8
  • 13

1 Answers1

30

You don't need the dictionary.id2token[1613] as you can use dictionary[1613] directly.

Note, that if you check the dictionary.id2token afterwards, it won't be empty any more. That's because the dictionary.id2token is formed only on request to save memory (as is stated during the init of Dictionary class).

Lenka Vraná
  • 1,686
  • 2
  • 19
  • 29
  • 1
    Thank you, this works. I had difficulty finding `id2token` in the documentation. Maybe I should submit a pull request to gensim to add docs explaining this. – cjrieds May 11 '17 at 21:12
  • 2
    Do you know what the intended purpose of ``id2token[ix]`` is? If it's not guaranteed to return the expected token, and ``dictionary[ix]`` works just as good I mean. – Thomas Fauskanger Jul 31 '17 at 10:40
  • 3
    I suppose there is some purpose behind this, but I don't have a clue. You may try to ask somebody from gensim team directly. – Lenka Vraná Aug 02 '17 at 12:52