0

While I am still getting acquainted with pandas I am trying to understand how to fill in gaps in time series when using a DataFrame with MultiIndex containing PeriodIndex. I used this question as a guide.

Here is a sample DataFrame:

df = pd.DataFrame({'R': ['R1'] * 4 + ['R2'] * 2,
'B': ['B1'] * 2 + ['B2'] * 1 + ['B3'] * 2 + ['B1'],
'Date': ["2014-09-29",
"2014-10-06",
"2014-10-13",
"2014-10-20",
"2014-11-03",
"2014-11-10"],
'V1': [11, 12, 13,  14,  15, 16],
'V2': [20, 19, 18,  17,  16, 15]})
df = df[['R','B','Date','V1', 'V2']]
df

Output:

    R   B        Date  V1  V2
0  R1  B1  2014-09-29  11  20
1  R1  B1  2014-10-06  12  19
2  R1  B2  2014-10-13  13  18
3  R1  B3  2014-10-20  14  17
4  R2  B3  2014-11-03  15  16
5  R2  B1  2014-11-10  16  15

And here is set up of PeriodIndex. Note that if instead of PeriodIndex a DateTimeIndex is used then reset_index works as expected.

df.Date = df['Date'].apply(lambda x: pd.to_datetime(x))
df.Date = df['Date'].apply(lambda x: x.to_period('W'))
df['Date'] = pd.PeriodIndex(df['Date'], freq='W') # <-- This is the line that gives problems at reset_index stage
# df['Date'] = pd.DatetimeIndex(df['Date']) # <-- No problems with reset_index if this line is used instead of the above
print '1.type of df.Date: {}'.format(type(df.Date))
print '2.df.Date.dtype: {}'.format(df.Date.dtype)
print '3.type of df.Date[0]: {}'.format(type(df.Date[0]))

Output:

1.type of df.Date: <class 'pandas.core.series.Series'>
2.df.Date.dtype: object
3.type of df.Date[0]: <class 'pandas.tseries.period.Period'>

The rest of the code is almost identical to the above mentioned post:

df.set_index(['R', 'B', 'Date'], inplace=True)
df = df.unstack(['R', 'B']).stack(['R', 'B'], dropna=False)  # don't drop NANs 
df.reset_index(inplace=True)

And here comes the Error:

d:\Anaconda\envs\py2k\lib\site-packages\pandas\core\frame.pyc in _sanitize_column(self, key, value)
   2139         elif isinstance(value, Index) or _is_sequence(value):
   2140             if len(value) != len(self.index):
-> 2141                 raise ValueError('Length of values does not match length of '
   2142                                  'index')
   2143 

ValueError: Length of values does not match length of index

In fact resetting index on levels other than PeriodIndex df.reset_index(level=[1,2], inplace=True) works as expected.

So the question is if this this actually an expected behavior of the reset_index applied to PeriodIndex as part of MultiIndex or is there something in the code that needs to be fixed?

Thanks in advance. Technical info: Python 2.7.8. IPython 2.3.0 pandas 0.14.1 numpy 1.9.0 compiler : MSC v.1500 64 bit (AMD64), system : Windows, release : 8, machine : AMD64, interpreter: 64bit

Community
  • 1
  • 1
Primer
  • 10,092
  • 5
  • 43
  • 55
  • 1
    This is fixed in 0.15.0 (releasing very soon). Here was the fix: https://github.com/pydata/pandas/pull/7802 – Jeff Oct 05 '14 at 22:00
  • 1
    This question appears to be off-topic because it is about a (fixed) bug in pandas 0.14.1, this should be a pandas github issue. – Andy Hayden Oct 06 '14 at 03:05
  • Thanks for the tip Jeff, indeed the patch you linked solved the issue. – Primer Oct 06 '14 at 21:48

0 Answers0