12

I have the following dataframe:

import pandas as pd
index = pd.date_range('2013-1-1',periods=10,freq='15Min')
data = pd.DataFrame(data=[1,2,3,4,5,6,7,8,9,0], columns=['value'], index=index)

How can I generate a mask based on the index value? I know I can do something like:

data['value'] > 3
Out[40]: 
2013-01-01 00:00:00    False
2013-01-01 00:15:00    False
2013-01-01 00:30:00    False
2013-01-01 00:45:00     True
2013-01-01 01:00:00     True
2013-01-01 01:15:00     True
2013-01-01 01:30:00     True
2013-01-01 01:45:00     True
2013-01-01 02:00:00     True
2013-01-01 02:15:00    False
Freq: 15T, Name: value, dtype: bool

I want to generate a mask to only consider some rows where the index is in a certain range. I was thinking of doing something like data['index'].time() > datetime.time(1,15) to generate a mask. Except of course data['index'] fails because index is not the name of a column. How can you reference the index value for a row in a mask?

BrandonAGr
  • 5,827
  • 5
  • 47
  • 72

2 Answers2

18

You can mask using indexer_between_time:

In [11]: data.index.indexer_between_time(start='01:15', end='02:00')
Out[11]: array([5, 6, 7, 8])

In [12]: data.iloc[data.index.indexer_between_time(start='1:15', end='02:00')]
Out[12]:
                     value
2013-01-01 01:15:00      6
2013-01-01 01:30:00      7
2013-01-01 01:45:00      8
2013-01-01 02:00:00      9

As you can see, you access the index by the attribute .index.

Note: indexer_between_time by default both include_start and include_end are True, it also offers a tz argument to compare the time to a different timezone.

Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
  • on dft with 2015-08-11 data: dft['2015-08-11 12:00:00':'2015-08-11 12:30:00'] takes 927 microseconds, whereas dft.ix[dft.index.indexer_between_time('12:00', '12:30') takes 402 and dft.iloc[dft.index.indexer_between_time('12:00', '12:30') 421 microseconds. So indexer_between_time seems 2x faster... (nota: on last pandas docs, iloc is said deprecated, use .ix instead) – comte Aug 15 '15 at 12:17
  • @comte "iloc is said deprecated, use .ix instead" are you sure about that, this seems wrong. You should *prefer* to use iloc (as it's more descriptive... and faster). Can't find mention of this deprecation online. – Andy Hayden Aug 15 '15 at 19:00
  • @Andy, you're correct, i made a mistake on my notes, it's .irow() and .icol() that are deprecated since 0.11 (ref in docs)[http://pandas.pydata.org/pandas-docs/stable/indexing.html] – comte Aug 16 '15 at 14:29
7

'start' and 'stop' keywords are deprecated.With pandas >17.1; I had to use the following syntax instead:

data.iloc[data.index.indexer_between_time('1:15', '02:00')]
Out[90]: 
                     value
2013-01-01 01:15:00      6
2013-01-01 01:30:00      7
2013-01-01 01:45:00      8
2013-01-01 02:00:00      9
John Saraceno
  • 229
  • 3
  • 11