-1

Lately I have been using ProcessPoolExecutor for accelerating the processing of some functions I wrote.

I have a question regarding one function I would like to accelerate.

This function

def thefunction(input_file, output_file, somepar)

Involves opening and reading the input file, processing it and writing the results in a output file.

Right now I am doing

    lista=glob.glob(os.path.join(args.thefolders,'path/this.json'))

    for filen in lista:
        print("Processing ",filen)
        thefunction(filen,None,args.somepar)

I would like to do some multiprocess mapping like

with ProcessPoolExecutor() as process_pool:
    work_done=list(process_pool.map(partial(thefunction,somepar=args.somepar),lista))

But I am a bit worried since the function involves I/O

Provided that the files accessed are different for every member of the list, is it safe to do the above use?

halfer
  • 19,824
  • 17
  • 99
  • 186
KansaiRobot
  • 7,564
  • 11
  • 71
  • 150

1 Answers1

0

If the files are different, IO operations from different processes at once are completely reasonable.

If the files are the same, such an operation is unsafe and would require to use a synchronization primitive such as a lock, which would render the multiprocessing inefficient.

Bharel
  • 23,672
  • 5
  • 40
  • 80