Questions tagged [scikit-learn-pipeline]

92 questions
2
votes
1 answer

How to use the feature_names_out with scikit's FunctionTransformer

I am trying the use feature_names_out on scikit's FunctionTransformer to get the same feature names but I get this error: Code: from sklearn.preprocessing import FunctionTransformer X = pd.Series(data=[1, 2, 3], name='numbers') transformer =…
2
votes
0 answers

How can I get the feature names after several fit_transform's from sklearn?

I'm running a machine learning model that requires multiple transformations. I applied polynomial transformations, interactions, and also a feature selection using SelectKBest: transformer = ColumnTransformer( transformers=[("cat",…
2
votes
1 answer

Unable to load pickled custom estimator sklearn pipeline

I have a sklearn pipeline that uses custom column transformer, estimator and different lambda functions. Because Pickle cannot serialize the lambda functions, I am using dill. Here is the custom estimator I have: class customOLS(BaseEstimator): …
Obiii
  • 698
  • 1
  • 6
  • 26
2
votes
1 answer

Extracting feature importances from an sklearn pipeline containing a multioutputclassifier within gridsearchcv?

I'm wondering whether I can extract feature importances with names from a scikit-learn pipeline that I've built. The pipeline contains a Gradient Boosting Classifier wrapped in a Multi Output classifier. The pipeline is part of a GridSearchCV…
2
votes
2 answers

AttributeError scikit learn pipeline based class

I am trying to write a sklearn based feature extraction pipeline. My pipeline code idea could be splitted in few parts A parent class where all data preprocessing (if required) could happen from sklearn.base import BaseEstimator,…
abhiieor
  • 3,132
  • 4
  • 30
  • 47
2
votes
0 answers

Specific Decision Rule from Decision Tree Classifier Pipeline With Vectorizing and Feature Union

In order to get the specific rules applied to a trained sample on a decision tree classifier, we need to use the decision_path method: decision_path(X[, check_input]). Now, working on a short text classification model, I have pipelined a feature…
2
votes
1 answer

How to pickle TPOT fitted pipeline?

I'm using the TPOT classifier, and after training the model, I want to save the best pipeline; I can get it using. model.fitted_pipeline_ This is an example of one of the outputs: Pipeline(steps=[('extratreesclassifier', …
Rodrigo A
  • 657
  • 7
  • 23
2
votes
1 answer

Feature mismatch: Prediction through scikit-learn Pipeline

I implemented the following scikit-learn pipeline inside a file called build.pyand later, pickled it successfully. preprocessor = ColumnTransformer(transformers=[ ('target', TargetEncoder(), COL_TO_TARGET), ('one_hot',…
eager_learner
  • 152
  • 1
  • 9
2
votes
0 answers

scikit-learn: Retrieve model object from the pipeline

I have the following pipeline build and what I want to do is obtain the random forest model object that gets built inside the pipeline. The rf is the only initialization and it doesn't have rf.estimators_ grid_params = [{'bootstrap': [True], …
add-semi-colons
  • 18,094
  • 55
  • 145
  • 232
1
vote
0 answers

All intermediate steps should be transformers and implement fit and transform or be the string 'passthrough'

I was studying In Depth: k-Means Clustering section from the textbook Jake VanderPlas's Python Data Science Handbook and I came across the following code block: from sklearn.datasets import load_digits from sklearn.manifold import TSNE from…
1
vote
0 answers

Imputation of mixed data types with pandas and Scikit-Learn

I have to create a pre-processing pipeline dynamically to impute missing values, this is, I want to go through all the columns in a pandas data frame (which I don't know before-hand), and impute their missing values. To impute the missing values I…
Rodrigo A
  • 657
  • 7
  • 23
1
vote
2 answers

AttributeError: 'ColumnTransformer' object has no attribute '_name_to_fitted_passthrough'

I am predicting the IPL match win probability. While deploying the model using streamlit it show this error: AttributeError: 'ColumnTransformer' object has no attribute '_name_to_fitted_passthrough' That's my code: from sklearn.compose import…
1
vote
0 answers

GLMM alike solution - adding an interaction step as an element of scikit-learn Pipeline for columns transformed in previous steps

I'm trying to create a solution that will be somehow similar to the Mixed Effects Model (GLMM) that is not present in scikit-learn at the moment. Imagine a simple heart-disease dataset from…
1
vote
1 answer

fit() missing 1 required positional argument: 'y'

X = df.drop(columns="CLASS") y = df.CLASS X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) X_train.shape, X_test.shape, y_train.shape, y_test.shape preprocessor = ColumnTransformer([ ('numeric',…
1
vote
0 answers

How do I extract feature importances from a Sklearn pipeline

I'm wondering how I can extract feature importances from Logistic regression, GBM and XGBoost in scikit-learn with the feature names when using the classifier in a pipeline with preprocessing. I want to know how do I extract feature importances from…