0

I am processing the following stream:

symbols = ['ABC', 'DFG', ... (52 of these)]

handlers = { symbol: Handler(symbol) for symbol in symbols }

for symbol, payload in lines:
    handlers[symbol].feed(payload)

I would like to split the work between the 4 CPU cores on my MacBookPro (as I have 600M lines to process).

Each handler is independent of the others. However it contains state, so I couldn't have CPU1 process 'ABC' and then CPU2 process 'ABC', or they might clash.

I imagine I need to use multiprocessing.Pool, something like this: How to split python work between cores? (Multiprocessing lib)

I'm thinking I could do core_index = symbols.index(symbol) % 4, and then something like core[core_index].execute_function(process_line(symbol, payload)) but it isn't that simple, I don't think. I probably need some queueing system.

Load should be similar between the four cores.

P i
  • 29,020
  • 36
  • 159
  • 267
  • depending on the situation, I would look at creating a "pool" of your own so you have greater control over the creation of the workers, and have each worker handle a subset of the possible symbols. This would only really work well if you have a good distribution of symbols, otherwise if one symbol is much more common than others, it may still all fall to one core really... – Aaron Jul 23 '21 at 18:46
  • I've asked a more specific question, after looking at `pool`s: https://stackoverflow.com/questions/68504412/how-can-i-execute-a-function-on-a-cpu-core-and-get-a-callback-when-it-has-compl – P i Jul 23 '21 at 19:47

0 Answers0