0

How can I get a count of the number of periods in a Pandas DatetimeIndex using a frequency string (offset alias)? For example, let's say I have the following DatetimeIndex:

idx = pd.date_range("2019-03-01", periods=10000, freq='5T')

I would like to know how many 5 minute periods are in a week, or '7D'. I can calculate this "manually":

periods = (7*24*60)//5

Or I can get the length of a dummy index:

len(pd.timedelta_range(start='1 day', end='8 days', freq='5T'))

Neither approach seems very efficient. Is there a better way using Pandas date functionality?

Turanga1
  • 123
  • 1
  • 7

2 Answers2

1

try using numpy

len(np.arange(pd.Timedelta('1 days'), pd.Timedelta('8 days'), timedelta(minutes=5)))

out:
2016

My testing, first import time:

import time

the OP solution:

start_time = time.time()
len(pd.timedelta_range(start='1 day', end='8 days', freq='5T'))
print((time.time() - start_time))

out:
0.0011057853698730469]

using numpy

start_time = time.time()
len(np.arange(pd.Timedelta('1 day'), pd.Timedelta('8 days'), timedelta(minutes=5)))
print((time.time() - start_time))

out:
0.0001723766326904297

Follow the sugestion of @meW, doing the performance test using timeit

using timedelta_range:

%timeit len(pd.timedelta_range(start='1 day', end='8 days', freq='5T'))
out:
91.1 µs ± 1.31 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

using numpy:

%timeit len(np.arange(pd.Timedelta('1 day'), pd.Timedelta('8 days'), timedelta(minutes=5)))
out:
16.3 µs ± 196 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Turanga1
  • 123
  • 1
  • 7
Terry
  • 2,761
  • 2
  • 14
  • 28
  • Terry, the question is about efficiency not the alternatives. May be your solution is right. So, to make it a solid answer try comparing your solution with other alternatives and provide the insights. Hope you get my point! – meW Mar 01 '19 at 13:46
  • @neW But in the scope of OP, `numpy` is much more fast – Terry Mar 01 '19 at 13:55
  • 1
    That's what I just said, if you can prove it with some dummy data that would make it a good answer else no worries! – meW Mar 01 '19 at 13:56
  • For better optimization results, use `%timeit` for one liner or `%%timeit` for multiple lines ;) – meW Mar 01 '19 at 14:03
  • Interesting answers, but shouldn't it be possible to calculate a solution directly rather than generate a dummy index with the sole purpose of determining the length? – Turanga1 Mar 01 '19 at 14:13
  • @Turanga1 probably yes but I do not know how =/ – Terry Mar 01 '19 at 14:21
  • 1
    @meW i appreciate the suggestion, i hope I did it right – Terry Mar 01 '19 at 14:48
0

I finally figured out a reasonable solution:

pd.to_timedelta('7D')//idx.freq

This has the advantage that I can specify a range using a frequency string (offset alias) and the period or frequency is inferred from the dataframe. The numpy solution suggested by @Terry is still the fastest solution where speed is important.

Turanga1
  • 123
  • 1
  • 7