I am processing the following stream:
symbols = ['ABC', 'DFG', ... (52 of these)]
handlers = { symbol: Handler(symbol) for symbol in symbols }
for symbol, payload in lines:
handlers[symbol].feed(payload)
I would like to split the work between the 4 CPU cores on my MacBookPro (as I have 600M lines to process).
Each handler is independent of the others. However it contains state, so I couldn't have CPU1 process 'ABC' and then CPU2 process 'ABC', or they might clash.
I imagine I need to use multiprocessing.Pool
, something like this: How to split python work between cores? (Multiprocessing lib)
I'm thinking I could do core_index = symbols.index(symbol) % 4
, and then something like core[core_index].execute_function(process_line(symbol, payload))
but it isn't that simple, I don't think. I probably need some queueing system.
Load should be similar between the four cores.