I have a MultiIndex DataFrame with gappy date values on level 1, like this:
np.random.seed(456)
j = [(a, b) for a in ['A','B','C'] for b in random.sample(pd.date_range('2018-01-01', periods=100, freq='D').tolist(), 5)]
j.sort()
i = pd.MultiIndex.from_tuples(j, names=['Name','Date'])
df = pd.DataFrame(np.random.random_integers(0,100,15), i, columns=['Vals'])
# print(df):
Vals
Name Date
A 2018-01-01 27
2018-01-08 43
2018-03-26 89
2018-03-29 42
2018-04-01 28
B 2018-01-02 79
2018-01-26 60
2018-02-18 45
2018-03-11 37
2018-03-23 92
C 2018-03-17 39
2018-03-20 81
2018-03-21 11
2018-03-27 77
2018-04-08 69
For each level 0 value, I want to fill in the index level 1 with every calendar date between the min and max date values for that level 0. (This Q&A addresses the scenario of filling in level 1 with the same value set for all level 0 values.)
E.g., for subset = df.loc['A']
I want to insert rows so that subset.index.values == pd.date_range(subset.index.values.min(), subset.index.values.max()).values
. I.e., the resulting DataFrame would look like:
Vals
Name Date
A 2018-01-01 27
2018-01-02 NaN
2018-01-03 NaN
2018-01-04 NaN
2018-01-05 NaN
2018-01-06 NaN
2018-01-07 NaN
2018-01-08 43
2018-01-09 NaN
...
Is there a pandaic way to accomplish this?
(The best I can come up with is to inefficiently and iteratively append new DataFrames for each level 0 value. Or similarly iteratively construct a list of index values and then pandas.concat
them with the original DataFrame.)