I'm trying to thread my code for better performance, using the multiprocessing library's Process module.
The skeleton of code is to create dictionaries for each thread that they work on, and after it's all done, the dictionaries are summed and saved to a file. The resources are created like:
histos = {}
for int i in range(number_of_threads):
histos[i] = {}
histos[i]['all'] = ROOT.TH1F objects
histos[i]['kinds_of'] = ROOT.TH1F objects
histos[i]['keys'] = ROOT.TH1F objects
Then in the Processes, each thread works with its own histos[thread_number] object, working on the contained ROOT.TH1Fs. However, my problem is that apparently if I start the threads with Process like this:
proc = {}
for i in range(Nthreads):
it0 = 0 + i * n_entries / Nthreads # just dividing up the workload
it1 = 0 + (i+1) * n_entries / Nthreads
proc[i] = Process(target=RecoAndRecoFix, args=(i, it0, it1, ch,histos))
# args: i is the thread id (index), it0 and it1 are indices for the workload,
# ch is a variable that is read-only, and histos is what we defined before,
# and the contained TH1Fs are what the threads put their output into.
# The RecoAndFix function works inside with histos[i], thus only accessing
# the ROOT.TH1F objects that are unique to it. Each thread works with its own histos[i] object.
proc[i].start()
then the threads do have access their histos[i] objects, but cannot write to them. To be precise, when I call Fill() on the TH1F histograms, no data is filled because it cannot write to the objects because they are not shared variables.
So here: https://docs.python.org/3/library/multiprocessing.html I've found that I should instead use multiprocessing.Array() to create an array that can be both read and written to by the threads, like this:
typecoder = {}
histos = Array(typecoder,number_of_threads)
for int i in range(number_of_threads):
histos[i] = {}
histos[i]['all'] = ROOT.TH1F objects
histos[i]['kinds_of'] = ROOT.TH1F objects
histos[i]['keys'] = ROOT.TH1F objects
However, it won't accept dictionary as a type. It will not work, it says TypeError: unhashable type: 'dict'
So what would be the best approach to solve this issue? What I need is to pass an instance of every "all kinds of keys" stored in dictionaries to each thread, so they work on their own. And they must be able to write these received resources.
Thanks for your help, and sorry if I'm overlooking something trivial, I did threaded code before, but not yet with python.