1

I have one unique DataFrame which I need to train in the same model (LogisticRegression) multiple times.

_list_scores = []
for i in range(df.shape[0]):

   X_train = df.iloc[0:i+1, :-1]
   y_train = df.iloc[0:i+1, -1:]   
   model.fit(X_train, y_train)
   _list_scores.append(model.score(X_test, y_test))

The logic is this model will be trained in the whole dataframe starting with 1 row until last row.

Loop 1 = train with 1 row and measure the score Loop 2 = train with 2 rows and measure the score Loop 3 = train with 3 rows and measure the score ... Loop n = train with "n" rows and measure the score

I tried with concurrent.futures and dask delayed, but for some reason my looping is faster than it...

Someone could please help me: how can I parallelize this?

rej
  • 11
  • 1
  • Hi, could you also add how you tried to implement it using both concurrent.futures and Dask? How long takes each model.fit calls? – Guillaume EB Apr 28 '23 at 14:47

0 Answers0