I am facing this error as I am trying to transform data with my scikit learn model.
The model is built as follows:
feature_columns_names = [
'transaction_id', 'created_at', 'amount', 'device_model','device_mode',
'transaction_sum', 'daily_amt_ratio', 'monthly_amt_ratio'
]
label_column = "is_fraud"
non_scaled_cols = ['created_at','device_model','device_mode','transaction_id','is_fraud']
numeric_features = [col for col in list(feature_columns_names) if col not in non_scaled_cols]
categorical_features = ['device_model','device_mode']
numeric_transformer = make_pipeline(
SimpleImputer(strategy="constant", fill_value=0),
StandardScaler())
categorical_transformer = make_pipeline(
SimpleImputer(strategy="constant", fill_value="unknown"),
OneHotEncoder(handle_unknown="ignore"),
)
preprocessor = ColumnTransformer(
transformers=[
('num', numeric_transformer, numeric_features),
('cat', categorical_transformer, categorical_features)],
remainder="drop")
preprocessor.fit(data)
joblib.dump(preprocessor, os.path.join(args.model_dir, "model.joblib"))
Here is my code for loading and using the model to transform my data:
feature_columns_dtype = {
'transaction_id' :'object',
'created_at' :'object' ,
'amount' :'float64',
'device_model' :'object' ,
'device_mode' : 'object' ,
'transaction_sum' : 'float64',
'daily_amt_ratio' : 'float64',
'monthly_amt_ratio' : 'float64',
}
label_column_dtype = {"is_fraud": "int64"}
def merge_two_dicts(x, y):
z = x.copy() # start with x's keys and values
z.update(y) # modifies z with y's keys and values & returns None
return z
df = pd.read_csv('s3://data/dataset_sample.csv',
header=None,
names=feature_columns_names + [label_column],
dtype=merge_two_dicts(feature_columns_dtype, label_column_dtype))
if len(df.columns) == len(feature_columns_names) + 1:
# This is a labelled example, includes the ring label
df.columns = feature_columns_names + [label_column]
elif len(df.columns) == len(feature_columns_names):
# This is an unlabelled example.
df.columns = feature_columns_names
model = joblib.load(os.path.join(model_dir, "model.joblib"))
model.transform(df)
The model loads correctly as well as the data but the last line (calling transform) on the data (df) produces the error:
AttributeError: 'ColumnTransformer' object has no attribute '_feature_names_in'
I have made sure the version of scikit learn am using is the same as the model's version, feature names are provided, and input data passed correctly, any clue what could be causing the error?