How to fillna() all columns of a dataframe from a single row of another dataframe with identical structure

Question

I have a train_df and a test_df, which come from the same original dataframe, but were split up in some proportion to form the training and test datasets, respectively.

Both train and test dataframes have identical structure:

A PeriodIndex with daily buckets
n number of columns that represent observed values in those time buckets e.g. Sales, Price, etc.

I now want to construct a yhat_df, which stores predicted values for each of the columns. In the "naive" case, yhat_df columns values are simply the last observed training dataset value.

So I go about constructing yhat_df as below:

import pandas as pd
yhat_df = pd.DataFrame().reindex_like(test_df)
yhat_df[train_df.columns[0]].fillna(train_df.tail(1).values[0][0], inplace=True)
yhat_df(train_df.columns[1]].fillna(train_df.tail(1).values[0][1], inplace=True)

This appears to work, and since I have only two columns, the extra typing is bearable.

I was wondering if there is simpler way, especially one that does not need me to go column by column.

I tried the following but that just populates the column values correctly where the PeriodIndex values match. It seems fillna() attempts to do a join() of sorts internally on the Index:

yhat_df.fillna(train_df.tail(1), inplace=True)

If I could figure out a way for fillna() to ignore index, maybe this would work?

Ben.T · Accepted Answer · 2019-05-30T19:05:32.323

0

you can use fillna with a dictionary to fill each column with a different value, so I think:

yhat_df = yhat_df.fillna(train_df.tail(1).to_dict('records')[0])

should work, but if I understand well what you do, then even directly create the dataframe with:

yhat_df = pd.DataFrame(train_df.tail(1).to_dict('records')[0], 
                       index = test_df.index, columns = test_df.columns)

edited May 30 '19 at 19:05

answered May 30 '19 at 18:29

Ben.T

29,160
6
32
54

Amazing! The only side-effect I see is that the resulting yhat_df has the order of columns sorted alphabetically. I am still on Pandas 0.19.2, so don't have the option to use `to_dict(orient='records', into=OrderedDict)`. This should not be a big problem since my code doesn't do anything with column positions. – Cod.ie May 30 '19 at 19:01
@Cod.ie if you want the columns in the same order, maybe you can add the parameter `columns` when you create the dataframe, I edited my answer for it. – Ben.T May 30 '19 at 19:05
Perfect! Thanks :) – Cod.ie May 30 '19 at 19:22

How to fillna() all columns of a dataframe from a single row of another dataframe with identical structure

1 Answers1