I'm learning to use pipelines as they look more clean. So, I'm working on the tabular playground competition on Kaggle.
I'm trying follow a pretty simple pipeline where I use a FunctionTransformer
to add a new column to the dataframe, do Ordinal Encoding
, and finally fit the data on a LinearRegression
model.
Here is the code:
def weekfunc(df):
print(df)
df = pd.to_datetime(df)
df['weekend'] = df.dt.weekday
df['weekend'].replace(range(5), 0, inplace = True)
df['weekend'].replace([5,6], 1, inplace = True)
get_weekend = FunctionTransformer(weekfunc)
col_trans = ColumnTransformer([
('weekend transform', get_weekend,['date']),
('label encoding', OrdinalEncoder(), ['country', 'store', 'product'])
])
pipe = Pipeline([
('label endoer', col_trans),
('regression', LinearRegression())
])
pipe.fit(X_train,y_train)
But the code breaks on the first step (FunctionTransformer
) and gives me the following error:
to assemble mappings requires at least that [year, month, day] be specified:
[day,month,year] is missing
which is weird since I can print inside the function being executed which shows it is in datetime
format. Even get_weekend.transform(X_train['date'])
works as intended. But it doesn't seem to work when all the steps are joined.