0

I am using AutoSKlearn in Python

The code works fine but when I change parameter for n_jobs = -1 that cause this error

RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.
/usr/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 12 leaked semaphore objects to clean up at shutdown

I googled and found a solution

https://github.com/automl/auto-sklearn/issues/996

The solution states that using

if __name__ == '__main__':

should fix the problem

I did that but still having the same error

Am I using it in a wrong way?

Can someone advise if I am setting that line correctly and how I should use it?

Here is my code:

import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
import pyodbc 
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import f1_score, precision_score, recall_score
import datetime
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.multiclass import OneVsRestClassifier
from sklearn.model_selection import cross_val_predict
#import winsound
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.neural_network import MLPClassifier
from sklearn.ensemble import AdaBoostClassifier
import time
import autosklearn.classification

if __name__ == '__main__':            

    df = pd.read_csv("c:\\my.csv")
    
    X = df.drop(Code, axis=1, errors='ignore')
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    
    mdl = autosklearn.classification.AutoSklearnClassifier(
        time_left_for_this_task=60*5,
        per_run_time_limit=30*1,
        n_jobs=-1,
        memory_limit = 1024 * 10,
        initial_configurations_via_metalearning=0,
        smac_scenario_args={'runcount_limit': 50},    )
    
    
    mdl.fit(X_train,y_train)
    y_pred=mdl.predict(X_test)    
khelwood
  • 55,782
  • 14
  • 81
  • 108
asmgx
  • 7,328
  • 15
  • 82
  • 143
  • 1
    New Python programmers are often mystified by `if __name__ == '__main__':` - but all it does is check if the script it is in happens to be the script that was used as the entry point for the running program, or whether it was imported as a module. If it's the entry point, `__name__` will be `'__main__'`, otherwise it will be the name of the module. It has nothing to do with your problem, if another solution led you to believe that, it must have been about some other seemingly related problem. – Grismar Feb 06 '22 at 00:40
  • You say you experience problems when `n_jobs` is set to `-1` (i.e. use all available processors), but do you have an example value at which it doesn't cause problems for you? Any other value? Also, you specify 10 GiB (10 * 1024 MiB) of memory per job, do you actually have the memory space to provide 10 GiB per processor in the system you run on? – Grismar Feb 06 '22 at 00:43
  • @Grismar yes i do have the memory and it works with n_jobs = 1 – asmgx Feb 06 '22 at 03:17
  • Have you tried increasing `n_jobs` to see at which value it first breaks? Because even if you have the 80 GB of memory (assuming an 8 core chip), there may be some other resource that runs out, causing processing to have trouble starting, causing these problems down the line. – Grismar Feb 06 '22 at 05:14

0 Answers0