Why is pandas time series resample raising IncompatibleFrequency error?

Question

The problem

I have a pandas DataFrame with a time series data for five years starting from 2006 where I add a PeriodIndex that is automatically converted from Periods made with pd.period_range() as seen in the code block below.

There, I want to resample() the four first years and I've used the time series offset aliases mentioned in the docs. When I use freq=1W it works, but with e.g. a frequency of 2 (or likewise for 3 weeks) I get an error that says

IncompatibleFrequency: Input has different freq=2W-SUN from PeriodIndex(freq=W-SUN)

which is mentioned in the Periods part of the time series docs and it says:

Adding and subtracting integers from periods shifts the period by its own frequency. Arithmetic is not allowed between Period with different freq (span).

Honestly, I'm not sure how this relates to my issue.

The general form of the error is that if my freq=XY, it gives Input has different freq=XY from PeriodIndex(freq=Y), unless X is 1.

The data

The original dataset is from a csv-file with multiple columns, but in the example I only have a single column A with the same number of rows.

import pandas as pd
# dummy DataFrame with 87648 rows
df = pd.DataFrame(dict(A=np.random.randint(1, 101, size=87648)))
# Add periods column, set as index
df['time'] = pd.period_range(start='2006-01-01 00:30', freq='30min', end='2011-01-01')
df = df.set_index('time')

Now, if I in e.g. ipython type df.index I get the following output:

PeriodIndex(['2006-01-01 00:30', '2006-01-01 01:00', '2006-01-01 01:30',
             '2006-01-01 02:00', '2006-01-01 02:30', '2006-01-01 03:00',
             '2006-01-01 03:30', '2006-01-01 04:00', '2006-01-01 04:30',
             '2006-01-01 05:00',
             ...
             '2010-12-31 19:30', '2010-12-31 20:00', '2010-12-31 20:30',
             '2010-12-31 21:00', '2010-12-31 21:30', '2010-12-31 22:00',
             '2010-12-31 22:30', '2010-12-31 23:00', '2010-12-31 23:30',
             '2011-01-01 00:00'],
            dtype='period[30T]', name='time', length=87648, freq='30T')

This seems to be along my expectations and match the data in the csv file from where it's loaded:

There are 87648 rows.
The first timestamp is 2006-01-01 00:30.
The last timestamp is 2011-01-01 00:00.

The attempt(s)

# This works
df['A'].loc['2006':'2009'].resample('1W').mean().plot()

# This gives error mentioned above
df['A'].loc['2006':'2009'].resample('2W').mean().plot()

Further:

I have the same problem if I try to use freq=6M, but it works if I do freq=1M. (Input has different freq=6M from PeriodIndex(freq=M))
It also fails with 7D, which according to my expectations should be the same as 1W.

Additional thoughts

There are obviously situations where certain periods won't work, but for half-hour data over several years, I'd expect that it would be possible to produce any smaller frequencies like arbitrary number of hours, days, weeks or months.

According to this answer, the following is a better approach:

df['A'].resample('D').interpolate()[::7]

but that gives me an InvalidIndexError: Reindexing only valid with uniquely valued Index objects. (I assume that there are duplicate index values at hours going from summer to winter during sunlight saving time.)

Also, I'm under the impression pandas aim to do such "heavy lifting" for us, and assume that a deeper understanding would enable users to utilize it without such workarounds.

Although there are several posts on SO on resampling, I've searched for "IncompatibleFrequency" and "Input has different freq", but there seems to be no other posts on it.

The question

I would like to understand why the error is raised, and how to resolve the issue of resampling to arbitrary periods - or at least to understand the limitations.

score 1 · Accepted Answer · answered Feb 21 '18 at 01:13

1

This is a bug with plot(), not resample(), and has been reported on GitHub (#14763).

As a workaround until the bug is fixed, you can convert your index to a DatetimeIndex with to_timestamp prior to plotting:

df.loc['2006':'2009', 'A'].resample('2W').mean().to_timestamp().plot()

Note that you may want to adjust the freq or how parameters of to_timestamp. See the docs for additional details on those parameters.

answered Feb 21 '18 at 01:13

root

32,715
6
74
87

Thanks, that works nicely. The bug is from 2016, and the issue thread didn't seem to be solving it or giving any clues about where to start on fixing it. – Thomas Fauskanger Feb 21 '18 at 10:04