2

I'm having trouble accessing attributes of intermediate steps in my sklearn pipeline. Here's my code:

from sklearn.pipeline import make_pipeline, make_union
from sklearn.compose import make_column_transformer
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, PowerTransformer, OneHotEncoder

categorical_pipeline = make_pipeline(
                    SimpleImputer(strategy='constant', fill_value='None'),
                    OneHotEncoder(sparse=False))

ratings_pipeline = make_pipeline(
                RatingEncoder(), 
                StandardScaler(), 
                PowerTransformer(method='yeo-johnson'))

numeric_pipeline = make_pipeline(
                SimpleImputer(strategy='constant', fill_value=0),
                StandardScaler(),
                PowerTransformer(method='yeo-johnson'))

preprocess = make_pipeline(
    make_union(  
        # Select all categorical features and impute NA values into a unique category
        make_column_transformer(
            (categorical_pipeline, select_categorical_features),
            remainder='drop'
        ),      
        # Select all rating-encoded features and convert them to numerical, apply Scaling+PowerTransform
        make_column_transformer(
            (ratings_pipeline, select_rated_features),
            remainder='drop'
        ),   
        # Select all numeric features and impute, Scale+PowerTransform
        make_column_transformer(
            (numeric_pipeline, select_numeric_features),
            remainder='drop'
        ),     
    )
)

I know how to access intermediate steps of a pipeline. Here, I access the PowerTransformer() of the numeric_pipeline with the following line:

preprocess[0].transformer_list[2][1].transformers[0][1][2]

which returns

PowerTransformer(copy=True, method='yeo-johnson', standardize=True)

which leads me to believe that I've accessed that step correctly. However, I want to pull the .lambdas_ attribute from this PowerTransformer, but when I do so, I get the following:

AttributeError: 'PowerTransformer' object has no attribute 'lambdas_'

What am I doing wrong? I ran fit() on the pipeline correctly and I'm accessing the PowerTransform() step correctly, so why am I getting an AttributeError?

mrgoldtech
  • 73
  • 1
  • 4

1 Answers1

2

Okay I solved this myself.

preprocess[0].transformer_list[2][1].transformers[0][1][2].lambdas_

is incorrect. Specifically, transformer_list and transformers returns pre-fit transformers, not post-fit transformers. The following code works:

preprocess.steps[0][1].transformer_list[2][1].transformers_[0][1][2].lambdas_
mrgoldtech
  • 73
  • 1
  • 4
  • 1
    Be aware that you can use attributes `named_steps['xxx']` (for Pipeline) and `named_transformers_['xxx']` (for ColumnTransformer). – glemaitre Dec 26 '19 at 08:54