2

I have a python code that uses the subprocess package to run in shell:

subprocess.call(mycode.py, shell=inshell)

When I execute the top command I see that I am only using ~30% or less of CPU. I realize some commands may be using disk and not cpu therefore I was timing the speed. The speed running this on a linux system seems slower than a mac 2 core system.

How do I parallelize this with threading or multiprocessing package so that I can use multiple CPU cores on said linux system?

Luca Angioloni
  • 2,243
  • 2
  • 19
  • 28
Arya
  • 189
  • 4
  • 17
  • Do you want to use multiple threads or processes so that you can execute the code in `mycode.py` multiple times? Or do you want to execute `mycode.py` only once and make it go faster by parallelizing the work? – FMc Jan 13 '17 at 01:52
  • Yes, I want to execute mycode.py only once but make it go faster by parallelizing. – Arya Jan 13 '17 at 01:52

3 Answers3

3

To parallelize the work done in mycode.py, you need to organize the code so that it fits into this basic pattern:

# Import the kind of pool you want to use (processes or threads).
from multiprocessing import Pool
from multiprocessing.dummy import Pool as ThreadPool

# Collect work items as an iterable of single values (eg tuples, 
# dicts, or objects). If you can't hold all items in memory,
# define a function that yields work items instead.
work_items = [
    (1, 'A', True),
    (2, 'X', False),
    ...
]

# Define a callable to do the work. It should take one work item.
def worker(tup):
    # Do the work.
    ...

    # Return any results.
    ...

# Create a ThreadPool (or a process Pool) of desired size.
# What size? Experiment. Slowly increase until it stops helping.
pool = ThreadPool(4)

# Do work and collect results.
# Or use pool.imap() or pool.imap_unordered().
work_results = pool.map(worker, work_items)

# Wrap up.
pool.close()
pool.join()

---------------------

# Or, in Python 3.3+ you can do it like this, skipping the wrap-up code.
with ThreadPool(4) as pool:
    work_results = pool.map(worker, work_items)
FMc
  • 41,963
  • 13
  • 79
  • 132
  • The different use-cases for process pools and thread pools are worth mentioning here, given the effects of CPython's GIL on threading. – Roland Smith Jan 13 '17 at 22:37
2

A little change to FMc's answer,

work_items = [(1, 'A', True), (2, 'X', False), (3, 'B', False)]
def worker(tup):
 for i in range(5000000):
     print(work_items)
 return

pool = Pool(processes = 8)
start = time.time()
work_results = pool.map(worker, work_items)
end = time.time()
print(end-start)
pool.close()
pool.join()

The code above takes 53.60 seconds. The trick below however, takes 27.34 seconds.

from multiprocessing import Pool
import functools
import time

work_items = [(1, 'A', True), (2, 'X', False), (3, 'B', False)]

def worker(tup):
    for i in range(5000000):
        print(work_items)
    return

def parallel_attribute(worker):
    def easy_parallelize(worker, work_items):
        pool = Pool(processes = 8)
        work_results = pool.map(worker, work_items)
        pool.close()
        pool.join()
    from functools import partial 
    return partial(easy_parallelize, worker)

start = time.time()
worker.parallel = parallel_attribute(worker(work_items))
end = time.time()
print(end - start)

Two comments: 1) I didn't see much of a difference with using multiprocessing dummy 2) Using Python's partial function (scope with nesting) works like a wonderful wrapper that reduces the computation time by 1/2. Reference: https://www.binpress.com/tutorial/simple-python-parallelism/121

Also, Thank you FMc!

Arya
  • 189
  • 4
  • 17
1

Well, you can create first a thread, then pass to it the function you want to parallelize. Inside the function you have the subprocess.

import threading
import subprocess

def worker():
    """thread worker function"""
    print 'Worker'
    subprocess.call(mycode.py, shell=inshell)
    return

threads = []
for i in range(5):
    t = threading.Thread(target=worker)
    threads.append(t)
    t.start()
Tim Givois
  • 1,926
  • 2
  • 19
  • 36
  • Thank you, just wondering: when you specify the number of iterations as 5, how does one determine what the optimum number would be? what determines how many threads will be spreading the subprocess.call function? – Arya Jan 13 '17 at 01:46
  • Well, there is no rule that depends on the cpu usage. I will try first with 3 threads as you say you are using ~30% of cpu. – Tim Givois Jan 13 '17 at 02:10