I have a train_df
and a test_df
, which come from the same original dataframe, but were split up in some proportion to form the training and test datasets, respectively.
Both train and test dataframes have identical structure:
- A PeriodIndex with daily buckets
- n number of columns that represent observed values in those time buckets e.g. Sales, Price, etc.
I now want to construct a yhat_df
, which stores predicted values for each of the columns. In the "naive" case, yhat_df
columns values are simply the last observed training dataset value.
So I go about constructing yhat_df
as below:
import pandas as pd
yhat_df = pd.DataFrame().reindex_like(test_df)
yhat_df[train_df.columns[0]].fillna(train_df.tail(1).values[0][0], inplace=True)
yhat_df(train_df.columns[1]].fillna(train_df.tail(1).values[0][1], inplace=True)
This appears to work, and since I have only two columns, the extra typing is bearable.
I was wondering if there is simpler way, especially one that does not need me to go column by column.
I tried the following but that just populates the column values correctly where the PeriodIndex values match. It seems fillna()
attempts to do a join()
of sorts internally on the Index:
yhat_df.fillna(train_df.tail(1), inplace=True)
If I could figure out a way for fillna()
to ignore index, maybe this would work?