I would like to access nltk.corpus.wordnet
in a multithreaded environment. As soon as I enable multithreading, methods such as synsets()
fail. If I disable it, everything works fine.
The error messages change. For example, an error could look like this, which looks very much like a race condition to me:
File "/home/lhk/anaconda3/envs/dlab/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py", line 1342, in synset_from_pos_and_offset
assert synset._offset == offset
There are other questions about this:
The problem here was also caused by multithreading: What would cause WordNetCorpusReader to have no attribute LazyCorpusLoader?
This question has a more general title but seems to describe the same problem (multithreaded corpus loading fails): Python NLTK multi threading
There is an issue about this: https://github.com/nltk/nltk/issues/1576
The solution to the first linked question was to load the corpus before your program branches up into individual threads. I've done that: wordnet.ensure_loaded()
is called before the multithreading.
The recommendation in the GitHub issue is to import wordnet within my threaded function. But that doesn't change anything.