I have a dataframe with two levels of row index, both levels are datetimes.
There are a lot of NA holes in the dataframe that I'd like to fill with the logic: if there is an NA(), then use the last-known value of a previous row with the same inner index.
This fragment shows what I'm expecting:
#forward-filling on multiindex doesn't work as I hoped.
df3.loc['2010-01-31', '2011-01-02'] = 99
df3.ffill()
#ok
assert df3.loc['2010-01-31', '2011-01-02'].values[0] == 99
#fail
assert df3.loc['2010-02-28', '2011-01-02'].values[0] == 99
assert df3.loc['2010-03-31', '2011-01-02'].values[0] == 99
Unfortunately it looks like this is not implemented for multiindex...
This is a one-off situation, not a generalized solution I need. Not sure what a reasonable hack would be.
Here is thecomplete code that generates the sample I experimented with
#create the main dataframe
dt = pd.DatetimeIndex(start='2010-1-1', end = '2010-12-31', freq='m')
dt2 = pd.DatetimeIndex(start='2011-1-1', end = '2011-1-10', freq='d')
mi = pd.MultiIndex.from_product([dt,dt2], names=['assessment_date', 'contract_date'])
df = pd.DataFrame(index=mi)
df['foo']=None
#sub information dfa
dta1 = pd.DatetimeIndex(start='2010-1-1', end = '2010-2-1', freq='m')
dta2 = pd.DatetimeIndex(start='2011-1-1', end = '2012-1-5', freq='d')
mia = pd.MultiIndex.from_product([dta1,dta2], names=['assessment_date', 'contract_date'])
dfa = pd.DataFrame(index=mia)
dfa['foo']="dfa"
#sub information dfb
dtb1 = pd.DatetimeIndex(start='2010-4-1', end = '2010-5-1', freq='m')
dtb2 = pd.DatetimeIndex(start='2011-1-9', end = '2011-1-12', freq='d')
mib = pd.MultiIndex.from_product([dtb1,dtb2], names=['assessment_date', 'contract_date'])
dfb = pd.DataFrame(index=mib)
dfb['foo']="dfb"
#take all the data in dfa, dfb, and put it into the df
df2 = pd.concat([dfa, dfb])
df2 = df2.reindex(df.index.intersection(df2.index))
df3 = df.combine_first(df2)
df3.head(50)
#forward-filling on multiindex doesn't work as I hoped.
df3.loc['2010-01-31', '2011-01-02'] = 99
df3.ffill()
#ok
assert df3.loc['2010-01-31', '2011-01-02'].values[0] == 99
assert df3.loc['2010-02-28', '2011-01-02'].values[0] == 99
assert df3.loc['2010-03-31', '2011-01-02'].values[0] == 99