0

I want to create a Multiprocess for a quite complex function in python: I have tested this function with a less complex code like this:

from joblib import Parallel, delayed, parallel_backend
from joblib import load, dump

def print_hello(hallo, tschüß, rechnen,i):
    print(i)
    print(hallo[2])
    print (tschüß)
    rechnen = rechnen +i
    hallo2 = pd.DataFrame(hallo)
    hallo2.to_csv('./hallo'+str(i)+'.csv')
    hallo1 = pd.read_csv('./hallo'+str(i)+'.csv')
    return rechnen

hallo = ['hallo', 'hi', 'hey']
tschüß = 'tschüß'    
with parallel_backend('threading'):
test = Parallel()(delayed(print_hello)(hallo, tschüß, rechnen, i) for i in range(10))

print(test)

This is working quit nice. However i get the following Error-Code:

joblib.my_exceptions.TransportableException: TransportableException

...

joblib.my_exceptions.JoblibTypeError: JoblibTypeError

...

TypeError: sum_row() missing 1 required positional argument: 'i'

When I want to make my complex function works, which looks like this:

def sum_row(count_series, path, folder, files_1, files_2, files_3, path_raw, i):
    print(i)
    df1 = pd.read_csv(path_raw + files_1[i], sep=',', low_memory=False)
    df2 = pd.read_csv(path_raw + files_2[i], sep=',', low_memory=False)
    df3 = pd.read_csv(path_raw + files_3[i], sep=',', low_memory=False)

    ##do some operations with those files and create df_test

    df_test.to_csv(path + folder + files_export[i])

    return 0

with parallel_backend('threading'):
    test = Parallel()(delayed(sum_row)(count_series, path, files_1, files_2, files_3,  path_raw, i) for i in range(len(files_1)))
Community
  • 1
  • 1
Mimi Müller
  • 416
  • 8
  • 25

1 Answers1

1

The reason you are getting the error is that you're missing the folder argument when calling the function.

test = Parallel()(delayed(sum_row)(count_series, path, folder, files_1, files_2, 
files_3,  path_raw, i) for i in range(len(files_1)))
mgracer
  • 175
  • 1
  • 11