0

For example, I have a pandas Series as

rng=pd.date_range('2020-12-20',periods=1000000,freq='H')
s=pd.Series(np.random.randn(len(rng)), index=rng)

It is simple to select all rows belong year 2021 by

%timeit -n1 s['2021']

which is super fast, and takes only 407 µs ± 193 µs per loop

Now if I want to select all rows that is at 1 o'clock. The only way I can think of is

%timeit -n1 s[s.index.hour==1]

It is much slower, and takes 28.9 ms ± 1.06 ms per loop

I am thinking that there must be a better approach to this. Because if we use the same method to get rows belong to year 2021, that would be

%timeit -n1 s[s.index.year==2021]

it will takes 28.9 ms too.

So what is the better way to select rows by hour, minute even second?

user15964
  • 2,507
  • 2
  • 31
  • 57

2 Answers2

1

Try with between_time

s.between_time('01:00:00','02:00:00',include_end=False)
BENY
  • 317,841
  • 20
  • 164
  • 234
1

You can try via at_time():

s.at_time('01:00:00')

OR

import datetime

s[datetime.time(1)]
#OR
s[datetime.time(1,0,0)]
Anurag Dabas
  • 23,866
  • 9
  • 21
  • 41
  • Thank you so much. Is this the best we can do? It still takes 10.5 ms much slower than s['2021'] – user15964 Jul 14 '21 at 00:43
  • @user15964 updated answer...added 1 more way but I don't think there is any more faster way other than these two....It may be slower than `s['2021']` but it is approax 3 times faster than `s[s.index.hour==1]` – Anurag Dabas Jul 14 '21 at 03:18
  • Thank you so much. But the new method appears to be of same speed as `at_time` on my computer – user15964 Jul 14 '21 at 07:27
  • @user15964 yup both are approax same..btw If your query is solved then you can try considering accepting the answer....or If you want more fast method( I don't think there is any more faster way other than these two) then you can try considering asking another question...thanks **:)** – Anurag Dabas Jul 14 '21 at 10:54