4

I have a pandas dataframe df which has one column constituted by datetime64, e.g.

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1471 entries, 0 to 2940
Data columns (total 2 columns):
date    1471  non-null values
id      1471  non-null values
dtypes: datetime64[ns](1), int64(1)

I would like to sub-sample df using as criterion the hour of the day (independently on the other information in date). E.g., in pseudo code

df_sub = df[ (HOUR(df.date) > 8) & (HOUR(df.date) < 20) ]

for some function HOUR.

I guess the problem can be solved via a preliminary conversion from datetime64 to datetime. Can this be handled more efficiently?

Acorbe
  • 8,367
  • 5
  • 37
  • 66

1 Answers1

7

Found a simple solution.

df['hour'] = df.date.apply(lambda x : x.hour)

df_sub = df[(df.hour > 8) & (df.hour) <20]

EDIT:

There is a property dt specifically introduced to handle this problem. The query becomes:

df_sub = df[ (df.date.dt.hour > 8) 
              &  (df.date.dt.hour < 20) ]
Acorbe
  • 8,367
  • 5
  • 37
  • 66
  • With Timestamp objects, this is now :`timestamps = pandas.date_range(start='2012-12-02 00:00:00', end='2012-12-03 23:00:00', freq='H')` and then `timestamps.hour` will return `Int64Index([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23], dtype='int64')` – Nikos Alexandris Dec 04 '22 at 00:51