Questions tagged [scikit-learn-pipeline]

92 questions
0
votes
0 answers

I need to understand the following error: "CatBoostError: The left argument is not fitted, only fitted models could be compared."

I am trying to run a RandomizedSearchCV on various classification models through "For" loop for Hyperparameter tuning. There is no issue with running any other models except CatBoost. Also the issue with Catboost arises when I used Pipeline in…
0
votes
1 answer

Is get_feature_names_out from scikit-learn SimpleImputer working?

I'd like to access to the name of the columns that were imputed by scikit-learn SimpleImputer and create a DataFrame. According to documentation, it should be possible with function get_feature_names_out. However, when I try the following example…
0
votes
1 answer

LassoCV getting axis -1 is out of bounds for array of dimension 0 and other questions

Good evening to all, I am trying to implement for the first time LassoCV with sklearn. My code is as follows: numeric_features = ['AGE_2019', 'Inhabitants'] categorical_features = ['familty_type','studying','Job_42','sex','DEGREE', 'Activity_type',…
0
votes
1 answer

Divide by zero encountered in true_divide f = msb / msw with SelectKBest

I tried to implement in my pipeline the SelectKBest function to improve my existing near model. Without this new step, the model gave me the following results: Best test negative MSE of the base model : -62.60 Best test R2 of the base model:…
0
votes
1 answer

Get OOB score within a pipeline for Random Forest

I was wondering for a machine learning project: is it possible to implement RandomForestRegressor inside a pipeline? Specifically, I need to determine the OOB score from a RandomForestRegressor. But my data requires a lot of preprocessing. I tried…
0
votes
1 answer

Fails to save model after running GridSearchCV with a scikit pipeline

I have the following toy example to replicate the issue import numpy as np import xgboost as xgb from sklearn.datasets import make_regression from sklearn.pipeline import Pipeline from sklearn.model_selection import GridSearchCV X, y =…
Li-Pin Juan
  • 1,156
  • 1
  • 13
  • 22
0
votes
0 answers

Specifying the columns using strings is only supported for pandas DataFrames

The code is fine and running for train & test data, but for the sample input it's showing an error while predicting data: test1 = pd.DataFrame(data=np.array(['MBBS', 'Psychiatrist', 8, 'Dadar', 'Mumbai', 10,…
0
votes
0 answers

Building Pipelines

I've been recently trying to set up a Pipeline to produce a Machine Learning model. I have built my own data preprocessing classes and a new class with an optimized sklearn algorithm: Regresor_Model; however when I declare the pipeline steps, for…
0
votes
0 answers

Preprocess and data transformation in machine learning

I have a problem where I have to predict a buyer using machine learning (created a dummy dataset). I need to transform the data first before I can use it for machine learning. I am aggregating information per id,visit level which gives me a list of…
0
votes
1 answer

Get features names from scikit pipelines

I am working on ML regression problem where I defined a pipeline like below based on a tutorial online. My code looks like below pipe1 = Pipeline([('poly', PolynomialFeatures()), ('fit', linear_model.LinearRegression())]) pipe2 =…
0
votes
0 answers

Sklearn manually add feature in pipeline after feature selection

I would like to add features manually after feature selection. For example, with this simple pipeline below. pipe = Pipeline([('feature_selection', SelectFromModel(LinearSVC())), ('clf', ExtraTreesClassifier())]) After…
0
votes
1 answer

Incompatible row dimensions when using passthrough in GridSearch over sklearn Pipeline with FeatureUnion

I am trying to do grid search over a sklearn pipeline that uses a custom transformer in a pipeline with FeatureUnion. It works fine when the pipeline uses the custom transformer class in FeatureUnion; however, it fails when the custom class is…
0
votes
0 answers

Am I implementing Pipeline with GridSearchCV for Regression correctly?

I'm practising machine learning algorithms (Lasso regression and decision trees) using Sklearn.pipeline and Sklearn.model_selection.GridSearchCV. I have split my dataset into training and test set. The following is my code. I wanted to know if my…
0
votes
0 answers

Clustering Step within Scikit Pipeline

I am trying to do clustering as a step in a Pipeline so that I can use the cluster as an additional feature. I have used this post as a guide but I am getting an error on the call to fit_transform() within the pipeline. My original transformer is…
0
votes
0 answers

scikit learn pipelines and `ColumnTransformer`

I am confused by the following Pipeline weirdness. Suppose I define a pipeline thus: pipe = Pipeline([ ('transformer', ColumnTransformer([('sc', StandardScaler(), [0, 1])])), ('model', LinearRegression()) ]) now define a dataframe thus: df…
Igor Rivin
  • 4,632
  • 2
  • 23
  • 35