I have a tokenized list of words in a vocabulary. (It's been passed through a set, so there are no duplicates.)
My problem
I want to generate a method which creates a dictionary that allows a mapping from the word to its index in the vocabulary.
My attempt
My current method is like so:
mapping = { w : vocabulary.index(w) for w in vocabulary }
This should work but it is far too inefficient, probably due to repeatedly using vocabulary.index(w) for thousands of words.
Question
Is there a library that I can use that does this more efficiently? Or just more efficient methods?
Thanks.
POSSIBLE SOLUTION 1
Currently, each time a word is reached in 'vocabulary', vocabulary.index() is implemented, which required a pass through 'vocabulary' to identify the index, which is done for every word. As suggested in an answer, a possibility is to enumerate 'vocabulary' first. This allows one pass through it to identify the index, like so:
mapping = { w : i for i, w in enumerate(vocabulary) }