3

I try to read files with multiprocessing in python. Here is a small example:

import multiprocessing
from time import *

class class1():
    def function(self, datasheetname):
        #here i start reading my datasheet

if __name__ == '__main__':
    #Test with multiprosessing
    pool = multiprocessing.Pool(processes=4)
    pool.map(class1("Datasheetname"))
    pool.close()

Now I get the following error:

TypeError: map() missing 1 required positional argument: 'iterable'

In another thread in this board I got the hint to do this with ThreadPool, but I don't know how to do that. Any ideas?

Big McLargeHuge
  • 14,841
  • 10
  • 80
  • 108
John28
  • 723
  • 3
  • 8
  • 12
  • Do you need to do this in parallel, or do you need to read in a bunch of CSV/Excel sheets? If the latter, maybe look into using [pandas.read_csv](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html) or [pandas.read_excel](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html), which can read in multiple files/sheets with a single call. – David Kretch Dec 03 '16 at 23:43

1 Answers1

5

Pool.map:

map(func, iterable[, chunksize])

A parallel equivalent of the map() built-in function (it supports only one iterable argument though). It blocks until the result is ready.

This method chops the iterable into a number of chunks which it submits to the process pool as separate tasks. The (approximate) size of these chunks can be specified by setting chunksize to a positive integer.

You need to pass an iterable of which each element is passed to the target func as an argument in each process.

Example:

def function(sheet):
    # do something with sheet
    return "foo"

pool = Pool(processes=4)
result = pool.map(function, ['sheet1', 'sheet2', 'sheet3', 'sheet4'])
# result will be ['foo', 'foo', 'foo', 'foo']
sirfz
  • 4,097
  • 23
  • 37