Questions tagged [scikit-learn-pipeline]

92 questions
1
vote
1 answer

Visualize sklearn stackingclassifier model pipeline construct

With scikit-learn pipeline we can visualize our pipeline construct. See below screenshot. I couldn't find similar plotting feature for a sklearn stacking classifier. How can I represent the ensemble model construct with sklearn stacking classifier?
1
vote
0 answers

SHAP library does not support TransformedTargetRegressor, how to calculate SHAP in native units with target transform

We are working with a regression problem, where we want to apply log transform to target variable. After model training, we want to obtain SHAP values in native units, that they would be suitable for end-user. When trying to create an explainer by…
1
vote
0 answers

Apply SimpleImputer on selected columns, skip if no columns are present

I have scikitlearn pipeline and I intend to encode a categorical feature. But problem is I have another step before this encoding which deletes the feature based on some logic and in the encoding step I want to encode only if there is the feature…
Obiii
  • 698
  • 1
  • 6
  • 26
1
vote
1 answer

Pass input from one step to other step in Column transformer scikit pipeline

I have a pipeline that looks like this: categorical_transformer = Pipeline(steps=[ ('categorical_imputer', SimpleImputer(strategy="constant", fill_value='Unknown')), ('encoder',…
Obiii
  • 698
  • 1
  • 6
  • 26
1
vote
0 answers

Clip output from sklearn pipeline predict

Let's see the following pipeline: scaler = ScalerFactory.get_scaler(scaler_type) model = MultiOutputRegressor(lgb.LGBMRegressor(metric='tweedie', **hyperparameters)) steps = [('scaler', scaler), ('model', model)] pipeline =…
1
vote
1 answer

How can I access fitted estimators in a ColumnTransformer?

I designed my sklearn pipeline in the following way: transf_pipe = make_column_transformer( (binning_pipe, ['na', 'nc']), (OneHotEncoder(drop='if_binary'), ['site_type']), (make_pipeline( common_country, OneHotEncoder()),…
3nomis
  • 1,175
  • 1
  • 9
  • 30
1
vote
0 answers

Trying to bootstrap a code block using the scikit-learn roc_auc_score

I have a code bit that i'm trying to duplicate except for my matches being encoded I just have a binary 0 or 1 for my data in the field that is to be indexed. If i substitute 1 or 0 for the "normal." I receive an error stating that it is not in the…
1
vote
1 answer

Is `sklearn.Pipeline` with regex really more performant than `spacy` for preprocessing huge volumes of text?

TL;DR I need help selecting between spacy and sklearn for processing a huge text corpus. I ran a test to measure the performance of each, but the results were unexpected. Moreover, because I'm new-ish to the frameworks involved, I lack confidence…
1
vote
1 answer

Multiple artifact paths when logging a model using mlflow and sklearn

I'm using mlflow to log parameters and artifacts of a Logistic Regression, but when I try to log the model so I can see all the files in the Mlflow UI, I see two folders: one named 'model' and the other one named 'logger' (the one I set). model =…
1
vote
1 answer

Stacking up imputers in a pipeline

I've a question about stacking multiple sklearn SimpleImputers in a Pipeline: import numpy as np import pandas as pd from sklearn.pipeline import Pipeline from sklearn.impute import SimpleImputer pipeline = Pipeline([ ('si1',…
1
vote
0 answers

Using different y in different stages of sklearn pipeline

I'm trying to build a sklearn.Pipeline for survival analysis including two stages: Class imbalance using imblearn classes. scikit-survival classes for running survival analysis. The problem I'm having is an incapability of target features between…
1
vote
1 answer

Error : All estimators should implement fit and transform, or can be 'drop' or 'passthrough' specifiers. StandardScaler() doesn't

I am trying to implement a model that uses ColumnTransformer() followed by SVC(). My transform method looks like: num_features = X_train_svm.select_dtypes(include=np.number).columns.to_list() cat_features =…
1
vote
1 answer

Fix a parameter in a scikit-learn estimator

I need to fix the value of a parameter of a scikit-learn estimator. I still need to be able to change all the other parameters of the estimator, and to use the estimator within scikit-learn tools such as Pipelines and GridSearchCV. I tried to define…
0
votes
1 answer

Value error using scikit-learn transformers

I am having trouble with a piece of code I am writing. Specifically a pipeline. The data is a simple numerical dataframe (firewall logs) which is being split in X_train and X_test very commonly. After splitting, I devised a pipeline. This pipeline…
GEBRU
  • 495
  • 2
  • 11
0
votes
0 answers

`cross_validate` not returning full pipeline

I have created a pipeline which looks like this - Pipeline(steps=[('preprocessor', ColumnTransformer(transformers=[('numerical_transform', RobustScaler(), …