Pandas idiom for attaching a predictions column to a dataframe

Question

What is the Pandas idiom for attaching the results of a prediction to the dataframe on which the prediction was made.

For example, if I have something like (where qualityTrain is the result of a stats models fit)

qualityTrain = quality_data[some_selection_criterion]
pred1 = QualityLog.predict(qualityTrain)
qualityTrain = pd.concat([qualityTrain, pd.DataFrame(pred1, columns=['Pred1'])], axis=1)

the 'Pred1' values are not aligned correctly with the rest of qualityTrain. If I modify the last line so to reads

...pd.DataFrame(pred1, columns=['Pred1'], index=qualityTrain.index)...

I get the results I expect.

Is there a better idiom for attaching results to a dataframe where the dataframe's may have an arbitrary index?

score 1 · Accepted Answer · answered Mar 19 '14 at 21:09

1

You can just do

qualityTrain['Pred1'] = pred1

Note that we're (statsmodels) going to have pandas-in, pandas-out for predict pretty soon, so it'll hopefully alleviate some of these pain points.

answered Mar 19 '14 at 21:09

jseabold

7,903
2
39
53

This works to fix the problem here, but I have another that is related and that mystifies me: [why does `predict` return an un-indexed array](http://stackoverflow.com/q/22580477/656912)? – orome Mar 22 '14 at 16:44
Because it was written before pandas existed and we haven't updated it yet. – jseabold Mar 22 '14 at 21:08

Pandas idiom for attaching a predictions column to a dataframe

1 Answers1