0

I am confused by the following Pipeline weirdness. Suppose I define a pipeline thus:

pipe = Pipeline([
    ('transformer', ColumnTransformer([('sc', StandardScaler(), [0, 1])])), 
    ('model', LinearRegression())
])

now define a dataframe thus:

df = pd.DataFrame(np.random.rand(10, 4))

Now, interestingly, pipe.fit(df[[0, 1, 2]], df[3]) works fine. However, pipe.predict(df[[0, 1, 2]]) does not, while pipe.predict(df[0, 1]) does. This seems wrong (pipelines are supposed to do their magic on both fit and predict steps). Am I missing something?

Alexander L. Hayes
  • 3,892
  • 4
  • 13
  • 34
Igor Rivin
  • 4,632
  • 2
  • 23
  • 35
  • There might be some mess with parenthesis. First option `pipe.predict(df[[0, 1, 2])` does not work because a closing `]` is missing (`pipe.predict(df[[0, 1, 2]])` should indeed work); `pipe.predict(df[[0, 1]])` does not work instead as you trained the model with 3 features and you're trying to predict with 2 features only. – amiola Oct 20 '22 at 09:48
  • @amiola There is certainly a ty;o in the question, but not where it comes from, and the question is based on empirical experience. I will edit it more extensively later. – Igor Rivin Oct 20 '22 at 12:58

0 Answers0