Problem:
I am downloading financial data from a server and processing this data afterwards. I do gather data for multiple stocks at the same time.
I need that the data downloader and the data processer to be processed in parallel (being that the data processer itself will be comprised of several processes).
I absolutely need the data for each stock to be processed in a serialized fashion, but in case I have more than 1 stock I must process the stocks in a parallel fashion.
My understanding of the problem:
From what I gather I need a way to get from a single source a method to transfer this data to parallel processes that identify beforehand which data (according to stock id) goes to each process (each stock has its own process).
I have tried a few different approaches without success up to now, I just need to get stuck of this error:
RuntimeError: Queue objects should only be shared between processes through inheritance
Possible Solution
The next thing I will try to implement is using a multiprocessing.Manager().dict()
with collections.deque
or multiprocessing.Queue()
or list()
as elements and create a dict for the mp.Process()
instances (for each stock).
It is important that those data structures can be dinamically allocated as I might change stocks on runtime.
Question
What is a performant way to approch this problem?
Intuitively it seems that there is a better way than using multiprocessing.Manager().dict()
to do this task, but I haven't found it yet. Is there such thing?