Questions tagged [scikit-learn-pipeline]

92 questions
0
votes
2 answers

Iterate GridSearchCV Over Multiple Datasets and Classifiers (Python)

I have multiple datasets that I want to estimate parameters for using different classifiers (logistic and randomforest). I want to run each data for both classifiers using gridsearchcv, and then get the best parameters for each classifier per…
GSA
  • 751
  • 8
  • 12
0
votes
1 answer

Scikit Learn IsolationForest: How to Fit Multiple Dataframes With Different Parameters (Not Using GridSearchCV)

I have a five separate pandas dataframes that I've put inside a dictionary. I want to run five separate IsolationForest models in scikit-learn with different sets of parameters for each model. However, I don't want to run each model separtely. So my…
GSA
  • 751
  • 8
  • 12
0
votes
1 answer

debug and deploy featurizer (data processor for imodel inference) of sagemaker endpoint

I am looking at this example to implement the data processing of incoming raw data for a sagemaker endpoint prior to model inference/scoring. This is all great but I have 2 questions: How can one debug this (e.g can I invoke endpoint without it…
cs0815
  • 16,751
  • 45
  • 136
  • 299
0
votes
0 answers

Saving multiple trained models with sklearn

I'm trying to save multiple trained models for binary classification using pickle. I'm saving them in a pickle file as a tuple with the name of the model (name, model). However, when I load them and try to use the models all the predictions are…
0
votes
1 answer

Pass information between pipeline steps in sklearn

I am working on a simple text generation problem with LSTMs. To make the preprocessing more compact and reproducible, I decided to implement everything in sklearn fashion, using custom sklearn transformers, and the KerasClassifier from scikeras to…
lazarea
  • 1,129
  • 14
  • 43
0
votes
1 answer

Access Random Forest Features Names Attribute in a scikit learn Pipeline after Feature Selection

I'm running Random Forest Classifier in a Dataset, as a step of a sklearn pipeline. # Numerical numeric_cols = ['p1', 'p2', 'p3', 'p4', 'p5', 'p6', 'p7'] numeric_transformer = Pipeline( steps=[("scaler", StandardScaler())] ) #…
0
votes
1 answer

Pandas function Transformer raises SettingWithCopy warning

I'm learning to use pipelines and made a pretty simple pipeline with a FunctionTransformer to add a new column, an ordinal encoder and a LinearRegression model. But Turns out I'm getting SettingwithCopy when I run the pipeline and I isolated the…
default-303
  • 361
  • 1
  • 4
  • 15
0
votes
1 answer

Sklearn pipeline breaks when using FunctionTransformer

I'm learning to use pipelines as they look more clean. So, I'm working on the tabular playground competition on Kaggle. I'm trying follow a pretty simple pipeline where I use a FunctionTransformer to add a new column to the dataframe, do Ordinal…
default-303
  • 361
  • 1
  • 4
  • 15
0
votes
1 answer

Different AUC-PR scores using Logistic Regression with and without Pipeline

I'm trying to understand why I get different AUC-PR scores using Logistic Regression with and without Pipeline. Here is my code with using Pipeline: column_encoder = ColumnTransformer([ ('ordinal_enc', OrdinalEncoder(),…
0
votes
1 answer

How to fix: ValueError: too many values to unpack (expected 2) PCA

I have two variables: numeric_cols = ['FamilyMembers', 'ChronicDiseases'] and I have this pipeline: numeric_transformer = Pipeline( steps=[('scaler', StandardScaler(), 'red_dim',…
rnv86
  • 790
  • 4
  • 10
  • 22
0
votes
1 answer

New Feature in Scikit-Learn Pipeline - Interaction between two existing Features

I have two features in my data set: height and Area. I want to create a new feature by Interacting Area and Height using pipeline in scikit-learn. Can anyone please guide me on how I can achieve this? Thanks
0
votes
1 answer

It is necessary to encode labels when using `TfidfVectorizer`, `CountVectorizer` etc?

When working with text data, I understand the need to encode text labels into some numeric representation (i.e., by using LabelEncoder, OneHotEncoder etc.) However, my question is whether you need to perform this step explicitly when you're using…
rwb
  • 4,309
  • 8
  • 36
  • 59
0
votes
1 answer

Test score NaN while trying to evaluate a decision tree regressor model

I am trying to evaluate the accuracy of a decision tree model using both numerical and categorical features from the ames housing dataset. For the preprocessing of numerical features, I have used SimpleImputer and StandardScalar. As for the…
0
votes
0 answers

cross-validation not working for custom meta-estimator

I have a two-stage meta-estimator that is initialized with two pipelines. The estimator is meant to classify observations into 1, -1, or 0. The first pipeline learns to distinguish 0 from (1, -1), and the second learns to distinguish 1 from -1,…
ADF
  • 522
  • 6
  • 14
0
votes
1 answer

Including a Predictor in a Pipeline with Scikit-Learn

Actually this doubt is more like -- "why is this code working properly?". I was working out a problem from a text book. Specifically, the problem was to build a Pipeline that had a Data Preparation phase (remove NA values, perform Feature Scaling…