0

I have 28 input files and 28 available CPUs. I wrote a python script which uses subprocess to parse input files with QualityAnalysisMain_v2.py. Right now it works fine on one CPU. What I would like to do is to run in pararell each input file on one CPU - 28 runs at the same time.

I have tried approach here: python spreading subprocess.call on multiple CPU cores

especially this code:

import threading
import subprocess

def worker():
    """thread worker function"""
    print 'Worker'
    subprocess.call(mycode.py, shell=inshell)
    return

threads = []
for i in range(5):
    t = threading.Thread(target=worker)
    threads.append(t)
    t.start()

here:

for pliki in os.listdir(input_data):
    nazwa = pliki.split(".")[0]
    subprocess.call("mkdir " + output_data + nazwa, shell=True)

def SubprocessFiles():
    for pliki in os.listdir(input_data):
        print("python QualityAnalysisMain_v2.py " + input_data + pliki)
        subprocess.call("python QualityAnalysisMain_v2.py " + input_data + pliki, shell = True)
        return

threads = []
for i in range(28):
    t = threading.Thread(target=SubprocessFiles)
    threads.append(t)
    t.start()

But it ended up with parsing first input 28 times...

Here is my code - script is running for each file in input directory but on one CPU:

for pliki in os.listdir(input_data):
    nazwa = pliki.split(".")[0]
    print("python QualityAnalysisMain_v2.py " + input_data + pliki)
    subprocess.call("mkdir " + output_data + nazwa, shell = True)
    subprocess.call("python QualityAnalysisMain_v2.py " + input_data + pliki, shell = True)

Many thanks for any suggestions.

best, Agata

Agata
  • 3
  • 2

1 Answers1

0

You pass the same list to all the threads, so all start at the beginning of the list. You should instead pass a singlefile name to each of them:

def SubprocessFiles(pliki):
        print("python QualityAnalysisMain_v2.py " + input_data + pliki)
        subprocess.call("python QualityAnalysisMain_v2.py " + input_data + pliki, shell = True)
        return

threads = []
for i, pliki in enumerate(os.listdir(input_data)):
    t = threading.Thread(target=SubprocessFiles, args=[pliki])
    threads.append(t)
    t.start()
Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252