python parallelize for loops

Question

I have a time-intesive code which I would like to parallelize to be run on multiple processors. Is this even possible?

import numpy as np
def f_big(A, k, std_A, std_k, mean_A=10, mean_k=0.2, hh=100):
    return ( 1 / (std_A * std_k * 2 * np.pi) ) * A * (hh/50) ** k * np.exp( -1*(k - mean_k)**2 / (2 * std_k **2 ) - (A - mean_A)**2 / (2 * std_A**2))

outer_sum = 0
dk = 0.00001
for k in np.arange(dk,0.4, dk):
    inner_sum = 0
    for A in np.arange(dk, 20, dk):
        inner_sum += dk * f_big(A, k, 1e-5, 1e-5)
    outer_sum += inner_sum * dk

print piter_sum

`import threading` at the beginning of your file. Replace `f_big(...)` with `threading.Thread(target=f_big, args=(...)).start()`. — zondo, Mar 03 '16 at 20:49
I got `Traceback (most recent call last): File "integrator.py", line 31, in inner_sum += dk * threading.Thread(target=f_big, args=(A, k, 1e-5, 1e-5)).start() TypeError: unsupported operand type(s) for *: 'float' and 'NoneType'` — kilojoules, Mar 03 '16 at 20:51
Haha. Sorry, I forgot about that. You might try using `inner_sum` as a global variable and passing `dk` to `f_big()` and make `f_big()` modify `inner_sum`. You would need to create a list of the threads and then wait for them to finish before adding to `outer_sum`, though. — zondo, Mar 03 '16 at 20:55
To make `inner_sum` increment in multiple threads, it is probably better to have the threads themselves increment it. To do that, you would need `f_big()` to do the incrementing. For that, you would need to put `global inner_sum` at the beginning of `f_big()` and you would need to do `inner_sum += dk * (1 / (std_A...`. `dk` is not defined in `f_big()`, however, so you would need to pass it as another parameter. The last problem is that `outer_sum` can't be modified until `inner_sum` is done being evaluated, so you would need to wait for the threads to finish before adding to `outer_sum`. — zondo, Mar 03 '16 at 21:05
To do that, you could create a list of the threads, and then use a `for` loop to wait for each one to finish: `for thread in threads: thread.wait()`. The list could be created with `threads = []`, and then instead of `threading.Thread(target=f_big(), args=(...)).start()`, you would do `thread = threading.Thread(target=f_big(), args=(...))` `thread.start()` `threads.append(thread)`. — zondo, Mar 03 '16 at 21:07
You cannot parallelize a single for loop using MPI or message passing. Doing this (in essence, following a fork-join paradigm), is typically done with OpenMP in C/C++ or Fortran. Can you work with Cython? — NoseKnowsAll, Mar 04 '16 at 00:09
@kilojoules Then you should use OpenMP to parallelize your loop in (C)ython. Once you learn the basics of OpenMP, it should be pretty trivial. — NoseKnowsAll, Mar 04 '16 at 16:26
@NoseKnowsAll in another question this loop was sped up using numba. Do you think parrelization would have a similar effect, and greater speedup, or a lesser speed up? I'm on a macc with four processors and I can use mpirun with virtual processors. https://stackoverflow.com/questions/35782977/use-numba-to-speed-up-for-loop — kilojoules, Mar 04 '16 at 17:52

python parallelize for loops

0 Answers0