I have a very simple list comprehension I would like to parallelize:
nlp = spacy.load(model)
texts = sorted(X['text'])
# TODO: Parallelize
docs = [nlp(text) for text in texts]
However, when I try using Pool
from the multiprocessing
module like so:
docs = Pool().map(nlp, texts)
It gives me the following error:
Traceback (most recent call last):
File "main.py", line 117, in <module>
main()
File "main.py", line 99, in main
docs = parse_docs(X)
File "main.py", line 81, in parse_docs
docs = Pool().map(nlp, texts)
File "C:\Users\james\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\pool.py", line 260, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "C:\Users\james\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\pool.py", line 608, in get
raise self._value
File "C:\Users\james\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\pool.py", line 385, in _handle_tasks
put(task)
File "C:\Users\james\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "C:\Users\james\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'FeatureExtracter.<locals>.feature_extracter_fwd'
Is it possible to do this parallel computation without having to make objects pickleable? I'm open to examples tied to third-party libraries such as joblib
, etc.
edit: I also tried
docs = Pool().map(nlp.__call__, texts)
and that didn't work either.