While I am still getting acquainted with pandas I am trying to understand how to fill in gaps in time series when using a DataFrame with MultiIndex containing PeriodIndex. I used this question as a guide.
Here is a sample DataFrame:
df = pd.DataFrame({'R': ['R1'] * 4 + ['R2'] * 2,
'B': ['B1'] * 2 + ['B2'] * 1 + ['B3'] * 2 + ['B1'],
'Date': ["2014-09-29",
"2014-10-06",
"2014-10-13",
"2014-10-20",
"2014-11-03",
"2014-11-10"],
'V1': [11, 12, 13, 14, 15, 16],
'V2': [20, 19, 18, 17, 16, 15]})
df = df[['R','B','Date','V1', 'V2']]
df
Output:
R B Date V1 V2
0 R1 B1 2014-09-29 11 20
1 R1 B1 2014-10-06 12 19
2 R1 B2 2014-10-13 13 18
3 R1 B3 2014-10-20 14 17
4 R2 B3 2014-11-03 15 16
5 R2 B1 2014-11-10 16 15
And here is set up of PeriodIndex. Note that if instead of PeriodIndex a DateTimeIndex is used then reset_index works as expected.
df.Date = df['Date'].apply(lambda x: pd.to_datetime(x))
df.Date = df['Date'].apply(lambda x: x.to_period('W'))
df['Date'] = pd.PeriodIndex(df['Date'], freq='W') # <-- This is the line that gives problems at reset_index stage
# df['Date'] = pd.DatetimeIndex(df['Date']) # <-- No problems with reset_index if this line is used instead of the above
print '1.type of df.Date: {}'.format(type(df.Date))
print '2.df.Date.dtype: {}'.format(df.Date.dtype)
print '3.type of df.Date[0]: {}'.format(type(df.Date[0]))
Output:
1.type of df.Date: <class 'pandas.core.series.Series'>
2.df.Date.dtype: object
3.type of df.Date[0]: <class 'pandas.tseries.period.Period'>
The rest of the code is almost identical to the above mentioned post:
df.set_index(['R', 'B', 'Date'], inplace=True)
df = df.unstack(['R', 'B']).stack(['R', 'B'], dropna=False) # don't drop NANs
df.reset_index(inplace=True)
And here comes the Error:
d:\Anaconda\envs\py2k\lib\site-packages\pandas\core\frame.pyc in _sanitize_column(self, key, value)
2139 elif isinstance(value, Index) or _is_sequence(value):
2140 if len(value) != len(self.index):
-> 2141 raise ValueError('Length of values does not match length of '
2142 'index')
2143
ValueError: Length of values does not match length of index
In fact resetting index on levels other than PeriodIndex df.reset_index(level=[1,2], inplace=True)
works as expected.
So the question is if this this actually an expected behavior of the reset_index
applied to PeriodIndex
as part of MultiIndex
or is there something in the code that needs to be fixed?
Thanks in advance. Technical info: Python 2.7.8. IPython 2.3.0 pandas 0.14.1 numpy 1.9.0 compiler : MSC v.1500 64 bit (AMD64), system : Windows, release : 8, machine : AMD64, interpreter: 64bit