0

I need to define a function that will perform several operations on a dataframe containing a DatetimeIndex. One of these operations is to slice the dataframe based on a period or date passed as one of the function arguments.

When using loc within a code, the slice objects accept different options. For instance:

df.loc['2004'] 

to slice all rows with dates in 2004

df.loc['2004-01':'2005-02'] 

to slice all rows with dates between Jan 2004 and Feb 2005

I would like to be able to use only one argument of the function to construct the slice object that goes inside loc[]. Something like:

df.loc[period] 

Where period is the variable passed to the function as one of the arguments, and that can be defined in different formats to be correctly interpreted by the function.

I've tried:

  • Passing a string variable to loc, for instance with a value constructed as "\'2004\'"+':'+"\'2005\'", but it returns a KeyError "'2002':'2010'".

  • Converting a string to datetime objects using pd.to_datetime. But this results in "2004" converted to Timestamp('2004-01-01 00:00:00')

I've found this answer and this answer to be similar, but not specific to what I need.

I could use two arguments in the function to solve this (something like start_date, end_date) but was wondering if there is anyway to achieve it with only one.

U13-Forward
  • 69,221
  • 14
  • 89
  • 114
Javgs
  • 43
  • 6

1 Answers1

1

The slice built-in should work for this:

# equivalent to df.loc['2004':]
period = slice('2004', None)
df.loc[period]

# equivalent to df.loc['2004-01':'2005-02'] 
period = slice('2004-01', '2005-02')
df.loc[period]
araraonline
  • 1,502
  • 9
  • 14
  • Thanks, I guess this certainly provide a way to construct another internal function to interpret the string argument passed, before using it inside loc. For instance, if the user only pass '2004' as argument, meaning to slice all dates in 2004 only, the equivalent would be period=slice('2004','2004'). – Javgs Apr 30 '19 at 12:37
  • I think that should work. Also, if the user only pass '2004', you don't need a slice at all :] – araraonline Apr 30 '19 at 13:34
  • I mean you could do `period='2004'` – araraonline Apr 30 '19 at 13:37
  • You are totally right, I've checked and somehow both period='2004' or period "2004-10" work in df.loc[period], but not a string like "2004:2005" or "\'2004\'"+':'+"\'2005\'". – Javgs Apr 30 '19 at 18:18