0

What is the Pandas idiom for attaching the results of a prediction to the dataframe on which the prediction was made.

For example, if I have something like (where qualityTrain is the result of a stats models fit)

qualityTrain = quality_data[some_selection_criterion]
pred1 = QualityLog.predict(qualityTrain)
qualityTrain = pd.concat([qualityTrain, pd.DataFrame(pred1, columns=['Pred1'])], axis=1)

the 'Pred1' values are not aligned correctly with the rest of qualityTrain. If I modify the last line so to reads

...pd.DataFrame(pred1, columns=['Pred1'], index=qualityTrain.index)...

I get the results I expect.

Is there a better idiom for attaching results to a dataframe where the dataframe's may have an arbitrary index?

orome
  • 45,163
  • 57
  • 202
  • 418

1 Answers1

1

You can just do

qualityTrain['Pred1'] = pred1

Note that we're (statsmodels) going to have pandas-in, pandas-out for predict pretty soon, so it'll hopefully alleviate some of these pain points.

jseabold
  • 7,903
  • 2
  • 39
  • 53
  • This works to fix the problem here, but I have another that is related and that mystifies me: [why does `predict` return an un-indexed array](http://stackoverflow.com/q/22580477/656912)? – orome Mar 22 '14 at 16:44
  • Because it was written before pandas existed and we haven't updated it yet. – jseabold Mar 22 '14 at 21:08