4

I have some dataframe with timestamps as a column, I want to filter rows between 8:00:00 to 17:00:00 with np.where. I keep getting error messages on data/object types. Any help would be appreciated

example:

timestamp    volume
2013-03-01 07:59:00    5
2013-03-01 08:00:00    6
2013-03-01 08:01:00    7
2013-03-01 08:02:00    8

Basically I want to end with:

2013-03-01 08:00:00    6
2013-03-01 08:01:00    7
2013-03-01 08:02:00    8

By using methods along the line of

np.where(df['timestamp'] > dt.time('8:00:00')
yusica
  • 255
  • 2
  • 5
  • 13
  • 1
    Post the error messages you've been getting: they often times will tell you what you're doing wrong. – TemporalWolf Feb 13 '17 at 21:09
  • What is the `df['timestamp'].dtpye`? – juanpa.arrivillaga Feb 13 '17 at 21:09
  • You're going to want to use `>=` if the second row should make it into the result set. Also, `df[df.timestamp > '08:00:00']` should work whether the dtype is object or datetime. – cmaher Feb 13 '17 at 21:15
  • 1
    `df.set_index('timestamp').between_time('08:00','17:00').reset_index()`? – MaxU - stand with Ukraine Feb 13 '17 at 21:20
  • df['timestamp'].dtype shows dtype('O'). I tried the df[df.timestamp > '08:00:00'] but no luck - table returns empty dataframe np.where(df['timestamp'] > dt.time('8:00:00') yields: TypeError: descriptor 'time' requires a 'datetime.datetime' object but received a 'str' – yusica Feb 13 '17 at 21:32
  • Yep, I checked and I had the YYYY-mm-dd portion of your timestamp in my index, which is threw off my answer. As the answers below suggest, you're probably best off converting `df.timestamp` to datetime, especially if you need to compare a large number of dates. If you can include the date in your string (i.e. you're looking at timestamps for a single day), then you can do `df[df.timestamp >= '2013-03-01 08:00:00']`. – cmaher Feb 13 '17 at 22:12

3 Answers3

2

Try this:

In [226]: df
Out[226]:
             timestamp  volume
0  2013-03-01 07:59:00       5
1  2013-03-01 08:00:00       6
2  2013-03-01 08:01:00       7
3  2013-03-01 08:02:00       8

In [227]: df.dtypes
Out[227]:
timestamp    object
volume        int64
dtype: object

In [228]: df['timestamp'] = pd.to_datetime(df['timestamp'], errors='coerce')

In [229]: df.dtypes
Out[229]:
timestamp    datetime64[ns]  # <---- it's `datetime64[ns]` now
volume                int64
dtype: object

In [230]: df.set_index('timestamp').between_time('08:00','17:00').reset_index()
Out[230]:
            timestamp  volume
0 2013-03-01 08:00:00       6
1 2013-03-01 08:01:00       7
2 2013-03-01 08:02:00       8
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • I was able to run this until I look at my result pivot table and saw time before 8:00:00 still. My data contains multiple days so I wonder if this would work? I printed my resulting data into csv and still see time stamps outside of this period. – yusica Feb 13 '17 at 22:39
  • my mistake, forgot to set the dataframe to update. this worked well. – yusica Feb 14 '17 at 08:20
2

You can use between

I Generated a sample dataframe with

import datetime
d = {'timestamp': pd.Series([datetime.datetime.now() + 
          datetime.timedelta(hours=i) for i in range(20)]),
    'volume': pd.Series([s for s in range(20)])}
df = pd.DataFrame(d)

df['timeframe'] is

0    2017-02-13 22:37:54.515840
1    2017-02-13 23:37:54.515859
2    2017-02-14 00:37:54.515865
3    2017-02-14 01:37:54.515870
4    2017-02-14 02:37:54.515878
5    2017-02-14 03:37:54.515884
6    2017-02-14 04:37:54.515888
...
17   2017-02-14 15:37:54.515939
18   2017-02-14 16:37:54.515943
19   2017-02-14 17:37:54.515948

df.dtypes

timestamp    datetime64[ns]
volume                int64
dtype: object

As in your example dtype of df['timestamp']is object you can do

df['timestamp'] = pd.to_datetime(df['timestamp'], coerce=True)

By setting param coerce=True if the conversion fails for any particular string then those rows are set to NaT.

Then filtering can be done using between as below

df[df.timestamp.dt.strftime('%H:%M:%S').between('11:00:00','18:00:00')] will return

13 2017-02-14 11:37:54.515922      13
14 2017-02-14 12:37:54.515926      14
15 2017-02-14 13:37:54.515930      15
16 2017-02-14 14:37:54.515935      16
17 2017-02-14 15:37:54.515939      17
18 2017-02-14 16:37:54.515943      18
19 2017-02-14 17:37:54.515948      19
saloua
  • 2,433
  • 4
  • 27
  • 37
  • it didn't work as my timestamp dtype was object instead of datetime64[ns]. Got error: Can only use .dt accessor with datetimelike values – yusica Feb 13 '17 at 22:32
  • Can you try `df['date time'] = pd.to_datetime(df['timestamp'], coerce=True)` before the filtering ? – saloua Feb 14 '17 at 06:07
0

if you have a file with data as below : timestamp volume 2013-03-01 07:59:00 5 2013-03-01 08:00:00 6 2013-03-01 08:01:00 7 2013-03-01 08:02:00 8

Then while reading only you can skip the first line and you will get output as timestamp volume 2013-03-01 08:00:00 6 2013-03-01 08:01:00 7 2013-03-01 08:02:00 8

import pandas as pd
df=pd.read_csv("filename",skiprows=1)
print(df)
sameer_nubia
  • 721
  • 8
  • 8