0

I've been recently trying to set up a Pipeline to produce a Machine Learning model. I have built my own data preprocessing classes and a new class with an optimized sklearn algorithm: Regresor_Model; however when I declare the pipeline steps, for example:

from source.preprocessing_functions import Change_Data_Type, Years_Passed, Duplicated_Data 
from source.preprocessing_functions import One_Hot_Encoding_Train, Standard_Scaling_Train, Reduce_Memory_Usage
from source.machine_learning_toolbox import Regresor_Model
from sklearn.pipeline import Pipeline 

# Loading the data
# ================
data = lp.load_data(config.DATA,config.ID_VAR)
X, y = data.drop(config.TARGET,axis=1), data[config.TARGET]
X = X[config.PREDICTORS]

# Train-Test Split
# ================
X_train, X_tests, y_train, y_tests = train_test_split(X, y, test_size=0.3, random_state=config.SEED)


# Defining the Pipeline steps
# ===========================
steps = [('to_float', Change_Data_Type('Kms_Driven','Float')), ('years_passed', Years_Passed('Year',config.YEAR)),
         ('duplicates', Duplicated_Data()), ('one_hot_train', One_Hot_Encoding_Train(config.CATEGORICAL,drop_first=False)),
         ('scale_train', Standard_Scaling_Train(config.NUMERICAL)), 
         ('reduce_memory', Reduce_Memory_Usage()), ('model', Regresor_Model(config.BOUNDS))]

# Producing the pipeline
# ======================
pipeline = Pipeline(steps)
pipeline.fit(X_train, y_train)

and start running the script, I get an error message that it cannot find the module sklearn.preprocessing_functions

preprocessing_functions and machine_learning_toolbox are two scripts where I have stored the preprocessing classes and the optimized machine learning algorithm. In the literature, I have seen they use sklearn.Pipeline with pure sklearn libraries such as estimators = [('reduce_dim', PCA()), ('clf', SVC())]

Is there a walk-around method to create a pipeline using our own preprocessing tools and thus building the pipeline using sklearn?

Alexander L. Hayes
  • 3,892
  • 4
  • 13
  • 34
  • Can you include the *exact* error message you got? It is possible to define custom pipeline steps, but it's unclear where the error is coming at the moment. – Alexander L. Hayes Dec 16 '22 at 16:53
  • Currently: `sklearn.preprocessing_functions` does not appear in the code linked above. But `source.preprocessing_functions` does. Maybe there's a typo? – Alexander L. Hayes Dec 16 '22 at 17:09

0 Answers0