StratifiedKFold called from multiprocessing loop gives same results for each process

Question

I'm writing a custom GridSearchCV function by calling StratifiedKFold from my multiprocess loop StratifiedKFold is giving the same accuracy n times, n = number of processes

import multiprocessing 

PROCESSES = 5

with multiprocessing.Pool(PROCESSES) as pool:
    params = iter(list(range(0,5))) #i'm not using a parameter for now
    results = [pool.apply_async(computeKNN) for p in params]

    for r in results:
        print('\t', r.get())
        with open("output.txt", "a") as outfile:
            print(r.get(), file=outfile)
            print(r.get())

def computeKNN():
    kf = StratifiedKFold(n_splits=5, shuffle=True, random_state=None)
    results={}
    i=0
    score = 0
    best_score = 0
   
    ....
    for train_index, val_index in kf.split(X, y):

I'm doing fit and getting accuracy score. in this example when running 5 processes, I get the same accuracy 5 times. Am I missing any parameters for StratifiedKFold? I assumed that I should get different results even if the code is run n times in parallel

StratifiedKFold called from multiprocessing loop gives same results for each process

0 Answers0