-1

I have a train_df and a test_df, which come from the same original dataframe, but were split up in some proportion to form the training and test datasets, respectively.

Both train and test dataframes have identical structure:

  • A PeriodIndex with daily buckets
  • n number of columns that represent observed values in those time buckets e.g. Sales, Price, etc.

I now want to construct a yhat_df, which stores predicted values for each of the columns. In the "naive" case, yhat_df columns values are simply the last observed training dataset value.

So I go about constructing yhat_df as below:

import pandas as pd
yhat_df = pd.DataFrame().reindex_like(test_df)
yhat_df[train_df.columns[0]].fillna(train_df.tail(1).values[0][0], inplace=True)
yhat_df(train_df.columns[1]].fillna(train_df.tail(1).values[0][1], inplace=True)

This appears to work, and since I have only two columns, the extra typing is bearable.

I was wondering if there is simpler way, especially one that does not need me to go column by column.

I tried the following but that just populates the column values correctly where the PeriodIndex values match. It seems fillna() attempts to do a join() of sorts internally on the Index:

yhat_df.fillna(train_df.tail(1), inplace=True)

If I could figure out a way for fillna() to ignore index, maybe this would work?

Cod.ie
  • 380
  • 5
  • 14

1 Answers1

0

you can use fillna with a dictionary to fill each column with a different value, so I think:

yhat_df = yhat_df.fillna(train_df.tail(1).to_dict('records')[0])

should work, but if I understand well what you do, then even directly create the dataframe with:

yhat_df = pd.DataFrame(train_df.tail(1).to_dict('records')[0], 
                       index = test_df.index, columns = test_df.columns)
Ben.T
  • 29,160
  • 6
  • 32
  • 54
  • Amazing! The only side-effect I see is that the resulting yhat_df has the order of columns sorted alphabetically. I am still on Pandas 0.19.2, so don't have the option to use `to_dict(orient='records', into=OrderedDict)`. This should not be a big problem since my code doesn't do anything with column positions. – Cod.ie May 30 '19 at 19:01
  • @Cod.ie if you want the columns in the same order, maybe you can add the parameter `columns` when you create the dataframe, I edited my answer for it. – Ben.T May 30 '19 at 19:05
  • Perfect! Thanks :) – Cod.ie May 30 '19 at 19:22