0

Is the following behavior expected or a bug?

I have a process where I need rows from Dataframe, but in the boudary conditons the simple rule ( all rows 5 days preceeding will generate selections partially or fully outside the index. I would like pandas to behave like python and always return a frame even if sometimes there are no rows.

The index is Period index and the data is sorted.

Configuration is panas 12 numpy 1.7 and windows 64

In testing I have df.loc raises an index error if the requested slice is not completely with int he index

df[start:end] returned a frame but not always the rows I expected

import pandas as pd
october =  pd.PeriodIndex( start = '20131001', end = '20131010', freq = 'D')
oct_sales =pd.DataFrame(dict(units=[100+ i for i in range(10)]), index =october)

#returns empty frame as desired
oct_sales['2013-09-01': '2013-09-30']

# empty dataframe -- I was expecting two rows
oct_sales['2013-09-30': '2013-10-02']

# works as expected
oct_sales['2013-10-01': '2013-10-02']

# same as oct_sales['2013-10-02':]  -- expected no rows
oct_sales['2013-10-02': '2013-09-30']
jonblunt
  • 51
  • 2

1 Answers1

1

This is as expected. The slicing on labels (start : end), only works if the labels exist. To get what I think you are after reindex for the entire period, select, then dropna. That said, the loc behavior of raising is correct, while the [] indexing should work (maybe a bug).

In [23]: idx =  pd.PeriodIndex( start = '20130901', end = '20131010', freq = 'D')

In [24]: oct_sales.reindex(idx)
Out[24]: 
            units
2013-09-01    NaN
2013-09-02    NaN
2013-09-03    NaN
2013-09-04    NaN
2013-09-05    NaN
2013-09-06    NaN
2013-09-07    NaN
2013-09-08    NaN
2013-09-09    NaN
2013-09-10    NaN
2013-09-11    NaN
2013-09-12    NaN
2013-09-13    NaN
2013-09-14    NaN
2013-09-15    NaN
2013-09-16    NaN
2013-09-17    NaN
2013-09-18    NaN
2013-09-19    NaN
2013-09-20    NaN
2013-09-21    NaN
2013-09-22    NaN
2013-09-23    NaN
2013-09-24    NaN
2013-09-25    NaN
2013-09-26    NaN
2013-09-27    NaN
2013-09-28    NaN
2013-09-29    NaN
2013-09-30    NaN
2013-10-01    100
2013-10-02    101
2013-10-03    102
2013-10-04    103
2013-10-05    104
2013-10-06    105
2013-10-07    106
2013-10-08    107
2013-10-09    108
2013-10-10    109

In [25]: oct_sales.reindex(idx)['2013-09-30':'2013-10-02']
Out[25]: 
            units
2013-09-30    NaN
2013-10-01    100
2013-10-02    101

In [26]: oct_sales.reindex(idx)['2013-09-30':'2013-10-02'].dropna()
Out[26]: 
            units
2013-10-01    100
2013-10-02    101
Jeff
  • 125,376
  • 21
  • 220
  • 187