1

I have a time-intesive code which I would like to parallelize to be run on multiple processors. Is this even possible?

import numpy as np
def f_big(A, k, std_A, std_k, mean_A=10, mean_k=0.2, hh=100):
    return ( 1 / (std_A * std_k * 2 * np.pi) ) * A * (hh/50) ** k * np.exp( -1*(k - mean_k)**2 / (2 * std_k **2 ) - (A - mean_A)**2 / (2 * std_A**2))

outer_sum = 0
dk = 0.00001
for k in np.arange(dk,0.4, dk):
    inner_sum = 0
    for A in np.arange(dk, 20, dk):
        inner_sum += dk * f_big(A, k, 1e-5, 1e-5)
    outer_sum += inner_sum * dk

print piter_sum
kilojoules
  • 9,768
  • 18
  • 77
  • 149
  • 1
    `import threading` at the beginning of your file. Replace `f_big(...)` with `threading.Thread(target=f_big, args=(...)).start()`. – zondo Mar 03 '16 at 20:49
  • I got `Traceback (most recent call last): File "integrator.py", line 31, in inner_sum += dk * threading.Thread(target=f_big, args=(A, k, 1e-5, 1e-5)).start() TypeError: unsupported operand type(s) for *: 'float' and 'NoneType'` – kilojoules Mar 03 '16 at 20:51
  • Haha. Sorry, I forgot about that. You might try using `inner_sum` as a global variable and passing `dk` to `f_big()` and make `f_big()` modify `inner_sum`. You would need to create a list of the threads and then wait for them to finish before adding to `outer_sum`, though. – zondo Mar 03 '16 at 20:55
  • Hmm. Not sure if I'm following you. – kilojoules Mar 03 '16 at 20:59
  • To make `inner_sum` increment in multiple threads, it is probably better to have the threads themselves increment it. To do that, you would need `f_big()` to do the incrementing. For that, you would need to put `global inner_sum` at the beginning of `f_big()` and you would need to do `inner_sum += dk * (1 / (std_A...`. `dk` is not defined in `f_big()`, however, so you would need to pass it as another parameter. The last problem is that `outer_sum` can't be modified until `inner_sum` is done being evaluated, so you would need to wait for the threads to finish before adding to `outer_sum`. – zondo Mar 03 '16 at 21:05
  • To do that, you could create a list of the threads, and then use a `for` loop to wait for each one to finish: `for thread in threads: thread.wait()`. The list could be created with `threads = []`, and then instead of `threading.Thread(target=f_big(), args=(...)).start()`, you would do `thread = threading.Thread(target=f_big(), args=(...))` `thread.start()` `threads.append(thread)`. – zondo Mar 03 '16 at 21:07
  • You cannot parallelize a single for loop using MPI or message passing. Doing this (in essence, following a fork-join paradigm), is typically done with OpenMP in C/C++ or Fortran. Can you work with Cython? – NoseKnowsAll Mar 04 '16 at 00:09
  • Yes I would have no problem extending this with Cython. – kilojoules Mar 04 '16 at 00:26
  • @kilojoules Then you should use OpenMP to parallelize your loop in (C)ython. Once you learn the basics of OpenMP, it should be pretty trivial. – NoseKnowsAll Mar 04 '16 at 16:26
  • @NoseKnowsAll in another question this loop was sped up using numba. Do you think parrelization would have a similar effect, and greater speedup, or a lesser speed up? I'm on a macc with four processors and I can use mpirun with virtual processors. https://stackoverflow.com/questions/35782977/use-numba-to-speed-up-for-loop – kilojoules Mar 04 '16 at 17:52

0 Answers0