18

I need to represent a sequence of events. These events are a little unusual in that they are:

  • non-contiguous
  • non-overlapping
  • irregular duration

For example:

  • 1200 - 1203
  • 1210 - 1225
  • 1304 - 1502

I would like to represent these events using Pandas.PeriodIndex but I can't figure out how to create Period objects with irregular durations.

I have two questions:

  1. Is there a way to create Period objects with irregular durations using existing Pandas functionality?
  2. If not, could you suggest how to modify Pandas in order to provide irregular duration Period objects? (this comment suggests that it might be possible "using custom DateOffset classes with appropriately crafted onOffset, rollforward, rollback, and apply methods")

Notes

  1. The docstring for Period suggests that it is possible to specify arbitrary durations like 5T for "5 minutes". I believe this docstring is incorrect. Running pd.Period('2013-01-01', freq='5T') produces an exception ValueError: Only mult == 1 supported. I have reported this issue.
  2. The "time stamps vs time spans" section in the Pandas documentation states "For regular time spans, pandas uses Period objects for scalar values and PeriodIndex for sequences of spans. Better support for irregular intervals with arbitrary start and end points are forth-coming in future releases." (my emphasis)

Update 1

Building a Period with a custom duration looks pretty straightforward. BUT I think the main stumbling block will be persuading PeriodIndex to accept Periods with different freqs. e.g.:

In [93]: pd.PeriodIndex([pd.Period('2000', freq='D'), 
                         pd.Period('2001', freq='T')])

ValueError: 2001-01-01 00:00 is wrong freq

It looks like a central assumption in PeriodIndex is that every Period has the same freq.

Jack Kelly
  • 2,214
  • 2
  • 22
  • 32
  • I think it's best to keep periods periodic, i.e., regular. We might call what you are looking for, say, a "time span." How would a period help here? Could you, for example, make a 'start' column and an 'end' column for each of your spans? Please back up and explain what you are trying to accomplish with your data. – Dan Allan Aug 28 '13 at 13:05
  • Hi Dan. Thanks for the very quick reply. Your suggestion is very similar to what I'm now planning to implement: I plan to use a `DataFrame`. Each row will represent a single event. The index will represent the start time of each event and the there will be an `end` column to represent the end time of each event. To back up and explain my end goal: I'm writing a 'feature detector' which runs through a timeseries dataset and identifies 'features' in this raw data which can last varying durations. – Jack Kelly Aug 28 '13 at 13:27
  • 1
    I have a similar problem although my periods are regular, I cannot create periods with the durations I need - e.g. '5000T' – Graeme Stuart Jul 09 '14 at 15:56
  • 4
    I found this in the Pandas docs. This problem seems to be known. "For regular time spans, pandas uses Period objects for scalar values and PeriodIndex for sequences of spans. Better support for irregular intervals with arbitrary start and end points are forth-coming in future releases." http://pandas.pydata.org/pandas-docs/stable/timeseries.html#time-stamps-vs-time-spans – orange Dec 12 '15 at 06:50

2 Answers2

1

A possible solution, depending on the application, is to bin your data by creating a PeriodIndex that has a period equal to the smallest unit of time resolution that you need in order to handle your data and then divide the data amongst the bins for each event, leaving the remaining bins null.

storn
  • 111
  • 1
  • 3
1

if you have a time period of minutes you must pass date time include minutes like follow:

pd.PeriodIndex([pd.Period('2000-01-01 00:00', freq='T'), 
                     pd.Period('2001-01-01 00:00', freq='T')])

the result:

PeriodIndex(['2000-01-01 00:00', '2001-01-01 00:00'], dtype='period[T]', freq='T')
romulomadu
  • 627
  • 6
  • 9