1

I have developed a Python script that basically reads an excel file and trains a model using sklearns GridSearchCV, using the n_jobs statement:

def create_table():
    my_model = GridSearchCV(GradientBoostingRegressor(), tuned_parameters, cv=5, scoring='neg_mean_absolute_error', n_jobs=7)
    my_model.fit(x_data, y_data)
    return(my_model.predict(new_x_data))

This perfectly works when executing it. But now I am trying to execute it from a button click in a Dash app

Multiprocessing backed parallel loops cannot be nested below threads, setting n_jobs=1

My Dash app is like this:

def generate_html_table(dataframe, max_rows=50):
    return html.Table(
    # Header
    [html.Tr([html.Th(col) for col in dataframe.columns])] +

    # Body
    [html.Tr( [html.Td(dataframe.index[i])] + [html.Td(dataframe.iloc[i][col]) for col in dataframe.columns]) for i in range(min(len(dataframe), max_rows))]
    )

app = dash.Dash()
app.layout = html.Div([
    html.H1(children='Region Forecast',
        style={'textAlign': 'center'} ),
    html.Button(id='submit-button', n_clicks=0, children='Submit',
            style={ 'margin': 'auto',
                    'display': 'block' }),
    html.Table(id='output-table', children = generate_html_table(pd.DataFrame()))
    ])

@app.callback(Output('output-table', 'children'),
        [Input('submit-button', 'n_clicks')])

def reactive_compute(n_clicks):
    print('inside reactive compute')
    my_table = create_my_table()
    return(generate_html_table(my_table))

if __name__ == '__main__':
    app.run_server(debug=True)

I've seen this question, but it doesn't help me because I do not handle the multiprocessing myself (it's the scikitlearn function): Multiprocessing backed parallel loops cannot be nested below threads

The app would have to work only on local, I am not planning to put it on a web server.

Can I keep the parallel model fitting from the Dash app and if it is possible, how should I best approach this ?

Joos Korstanje
  • 186
  • 1
  • 10

1 Answers1

1

Are you using Windows? I have the exact same issue on Windows so I tried running the app in Ubuntu and it works fine.

You can install a Linux shell on Windows nowadays from the Microsoft Store if you don't want to mess about with virtual machines or actually installing it properly. It's great for testing and development.

Edit: GridSearchCV seems to handle it okay but I'm still getting that error when I run the regressor by itself.

Edit 2: GridSearchCV was using all threads but only loading them to 10-20%. Running the app with gunicorn solves this.

gunicorn my_app:server

add below to my_app.py as well:

server = app.server
Mitch
  • 71
  • 1
  • 10