Parallelize Array Assignments in Python

Question

I have been trying to parallelize the whole function when it is called at main, or, any segments of the function which you see below without luck and it seems that I can't get away with the TypeError: function object is not iterable. Appreciate any suggestion.

from joblib import Parallel, delayed
num_cores = multiprocessing.cpu_count()
parallel = Parallel(n_jobs=num_cores)
from multiprocessing import Pool
p = Pool(4)

def kmean(layerW,cluster):
    weights1d = np.reshape(layerW,-1)
    print(np.shape(weights1d))

    #Parallelizing Here
    centroids,_ = parallel(delayed(kmeans(weights1d, cluster)))
    idxs,_      = parallel(delayed(vq(weights1d,centroids)))

    #Here, using Parallel
    weights1d_q = parallel(delayed([centroids[idxs[i]] for i in range(len(weights1d))]))

    #OR --- using pool instead
    weights1d_q = p.map([centroids[idxs[i]] for i in range(len(weights1d))])
    weights4d_q  = np.reshape(weights1d_q, np.shape(layerW))
    return weights4d_q

that could be good as well, but I guess the bottleneck here is the `weights1d_q` assignment at the end when I try to assign centroid values to each element — Amir, Aug 26 '19 at 05:29
Have you identified what `function object` it's talking about? — hpaulj, Aug 26 '19 at 07:12
All those 3/4 instances I tried to parallelize. Tested them individually and had the same error — Amir, Aug 26 '19 at 13:37
When you ask about an error, you should make it clear exactly where it occurs. Usually we do that by quoting the traceback. That is, the full error message. Which line of code is trying to iterate on a variable that is a function? My guess is that somewhere you assigned a variable, `x = foo`, when you should have done `x = foo()` or `x = foo(args)`. — hpaulj, Aug 26 '19 at 16:06
Can you post code that we can copy and paste to see the problem? If you read the answer from @user3666197 and the joblib docs https://joblib.readthedocs.io/en/latest/parallel.html you'll see that your current `delayed` calls are wrong... — tomjn, Aug 27 '19 at 13:04

score 3 · Answer 1 · edited Aug 28 '19 at 09:00

Q : I can't get away with the TypeError: function object is not iterable

For the sake of the `TypeError`:

TypeError exceptions is being thrown right due to wrong syntax ( ill-formated call to a joblib.Parallel( delayed( ... ) ... ) mis-obeying the documented calling syntax-constructor.

Example 1: a correct call:
This call follows the documented syntax-specification down to the last dot:

>>> from joblib import Parallel, delayed
>>> parallel = Parallel( n_jobs = -1 )
>>> import numpy as np
>>> parallel( delayed( np.sqrt ) ( i**2 ) for i in range( 10 ) )
#          ^  ^^^^^^^     ^^^^     ^^^^   |||
#          |  |||||||     ||||     ||||   vvv
#JOBS(-1):-+  |||||||     ||||     ||||   |||
#DELAYED:-----+++++++     ||||     ||||   |||
#FUN( par ):--------------++++     ||||   |||
#     |||                          ||||   |||
#     +++-FUN(signature-"decl.")---++++   |||
#     ^^^                                 |||
#     |||                                 |||
#     +++-<<<-<iterator>-<<<-<<<-<<<-<<<--+++
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]

and the result generated confirms that the call was fully compliant and interprete-able.

Example 2: a wrong call:

>>> from joblib import Parallel, delayed
>>> parallel = Parallel( n_jobs = -1 )
>>> import numpy as np
>>> parallel( delayed( np.sqrt( 10 ) ) )          #### THIS SLOC IS KNOWINGLY WRONG
#          ^  ^^^^^^^     ^^^^(????)  ????   ???  ####
#          |  |||||||     ||||        ||||   vvv  ####
#JOBS(-1):-+  |||||||     ||||        ||||   |||  ####
#DELAYED:-----+++++++     ||||        ||||   |||  #### DELAYED( <float64> )
#FUN( par ):--------------++++        ||||   |||  #### GOT NO CALLABLE FUN( par ) 
#     |||                             ||||   |||  ####        BUT A NUMBER
#     +++-FUN(signature-"decl.")------++++   |||  ####        FUN( signature )
#     ^^^                                    |||  ####        NOT PRESENT
#     |||                                    |||  ####        AND FEEDER
#     +++-<<<-<iterator>-<<<-<<<-<<<-<<<-<<<-+++  #### <ITERATOR> MISSING
#                                                 ####
Traceback (most recent call last):                ####   FOR DETAILS, READ THE O/P
  File "<stdin>", line 1, in <module>             ####   AND EXPLANATION BELOW
  File ".../lib/python3.5/site-packages/joblib/parallel.py", line 947, in __call__
    iterator = iter(iterable)
TypeError: 'function' object is not iterable

and the result confirms, that the O/P has used a syntax, that is incompatible with the documented joblib.Parallel( delayed(...) ... )
Q.E.D.

REMEDY :

Follow the joblib.Parallel( delayed( ... ) ... ) documented syntax:

#entroids, _ = parallel( delayed( kmeans(weights1d, cluster)))
#                                 ^^^^^^(..................)
#                                 ||||||(..................)
#THIS-IS-NOT-A-CALLABLE-BUT-VALUE-++++++(..................)
#
centroids, _ = parallel( delayed( kmeans ) ( weights1d, cluster ) for ... )
#                                 ^^^^^^     ^^^^^^^^^^^^^^^^^^   |||||||
#                                 ||||||     ||||||||||||||||||   vvvvvvv
# CALLABLE FUN()------------------++++++     ||||||||||||||||||   |||||||
#          FUN( <signature> )----------------++++++++++++++++++   |||||||
#               ^^^^^^^^^^^                                       |||||||
#               |||||||||||                                       |||||||
#               +++++++++++------------<<<--feeding-<iterator>----+++++++

The best first step :

is to re-read the documented details of how the joblib.Parallel was designed and what are the modes-of-use, so as to become better acquainted with the tool:

joblib.Parallel( n_jobs       = None,   # how many jobs will get instantiated
                 backend      = None,   # a method, how these will get instantiated
                 verbose      = 0,
                 timeout      = None,
                 pre_dispatch = '2 * n_jobs',
                 batch_size   = 'auto',
                 temp_folder  = None,
                 max_nbytes   = '1M',
                 mmap_mode    = 'r',
                 prefer       = None,   # None | { ‘processes’, ‘threads’ }
                 require      = None    # None | ‘sharedmem’ ~CONSTRAINTS backend
                 )

Next, one may master some trivial example ( and experiment and extend it towards one's intended use-case ):

      Parallel(  n_jobs = 2 ) ( delayed( sqrt ) ( i ** 2 ) for i in range( 10 ) )
      #          ^              ^^^^^^^  ^^^^     ^^^^^^   |||
      #          |              |||||||  ||||     ||||||   vvv
      #JOBS:-----+              |||||||  ||||     ||||||   |||
      #DELAYED:-----------------+++++++  ||||     ||||||   |||
      #FUN( par ):-----------------------++++     ||||||   |||
      #     |||                                   ||||||   |||
      #     +++--FUN(-signature-"declaration"-)---++++++   |||
      #     ^^^                                            |||
      #     |||                                            |||
      #     +++-<<<-<iterator>-<<<-<<<-<<<-<<<-<<<-<<<-<<<-+++

      Parallel(  n_jobs = -1 ) ( 
                 delayed( myTupleConsumingFUN ) ( # aFun( aTuple = ( a, b, c, d ) )
                           aTupleOfParametersGeneratingFUN( i ) )
                 for                                        i in range( 10 )
                 )

NEXT: try to understand the costs and limits of using `n_jobs` instantiation(s)

The default backend of joblib will run each function call in isolated Python processes, therefore they cannot mutate a common Python object defined in the main program.

However if the parallel function really needs to rely on the shared memory semantics of threads, it should be made explicit with require='sharedmem'

Keep in mind that relying a on the shared-memory semantics is probably suboptimal from a performance point of view as concurrent access to a shared Python object will suffer from lock contention.

Using the threads-based backend permits "sharing", yet it implicates an immense cost of doing that - threads re-introduce the GIL-stepping which will re-[SERIAL]-ise the flow of code-execution back into a one-after-another-after-another in a GIL-lock-stepping fashion. For computing-intensive processing that yields worse performance, than the original pure-[SERIAL] code ( while this mode can help for latency-masking use-cases, where waiting for network-responses may allow threads to release GIL-lock and let other threads to go ahead and continue the work )

There are steps, one may implement so as to make separate process-based computing being able to communicate such need, yet, that comes at some add-on costs.

Computing intensive problems have to balance the needs for ultimate performance ( using more cores ) yet having in mind to have just an isolated (split) work-unit and minimum add-on costs for parameter-transfers and results-returns, all of which may easily cost more, than a wrong-designed intent to harness the joblib.Parallel available forms of just-[CONCURRENT] process-scheduling.

For more details on joblib.Parallel

For more details on add-on costs and atomicity-of-work implications on parallel-speedup

thanks for the reply @user3666197, but none were related to my question! — Amir, Aug 26 '19 at 20:57
@Amir With all respect, after 40+ years spent by designing, analyzing and profiling HPC-grade **parallel-computing** and distributed-computing systems I own a pool of hands-on experience to dare to claim that all of these aspects are important for making the code both error-free and as close as possible to HPC-grade preformance. ***Feel free** to express other arguments altogether with a reproducible MVC-problem formulation* ( none of which was present here so far for a benchmark ), but voting down ( penalisizing a scenario of steps ) is not compatible with StackOverflow Community Netiquette. — user3666197, Aug 26 '19 at 21:30
I appreciate your time to provide an answer which was informative, but the text of your answer does not reflect on my question on why the above function can't be parallelized neither at the def function level nor at the individual segments. — Amir, Aug 26 '19 at 21:39
@Amir You are not right. Post EXACTLY REFLECTS the error,that is root-cause of your failing code.Demonstration presented in the **[The best first step]** & the **[NEXT]** paragraphs has provided sufficient proof of that.Most importantly,if it were followed(instead of downvoting and further objections)it will lead your steps towards working code.Remarks about performance ceiling and RAM-limitations were a bonus for possible further shaping of your parallelisation skills,yet the steps above were a direct way to remove the principal bug in your original code-sample.Documented syntax was disobeyed — user3666197, Aug 27 '19 at 12:40

Parallelize Array Assignments in Python

1 Answers1

For the sake of the TypeError:

REMEDY :

The best first step :

NEXT: try to understand the costs and limits of using n_jobs instantiation(s)

For the sake of the `TypeError`:

NEXT: try to understand the costs and limits of using `n_jobs` instantiation(s)