6

Given the following data frame:

import pandas as pd
df=pd.DataFrame({'A':['a','b','c'],
        'first_date':['2015-08-31 00:00:00','2015-08-24 00:00:00','2015-08-25 00:00:00']})
df.first_date=pd.to_datetime(df.first_date) #(dtype='<M8[ns]')
df['last_date']=pd.to_datetime('5/6/2016') #(dtype='datetime64[ns]')
df

    A   first_date   last_date
0   a   2015-08-31  2016-05-06
1   b   2015-08-24  2016-05-06
2   c   2015-08-25  2016-05-06

I'd like to create a new column which contains the list (or array) of dates between 'first_date' and 'last_date' which excludes weekends.

So far, I've tried this:

pd.date_range(df['first_date'],df['last_date'])

...but this error occurs:

TypeError: Cannot convert input to Timestamp

I also tried this before pd.date_range...

pd.Timestamp(df['first_date'])

...but no dice.

Thanks in advance!

P.S.:

After this hurdle, I'm going to try looking at other lists of dates and if they fall within the generated array (per row in 'A'), then subtract them out of the list or array). I'll post it as a separate question.

Dance Party
  • 3,459
  • 10
  • 42
  • 67

1 Answers1

10

freq='B' gives you business days, or no weekends.

Your error:

TypeError: Cannot convert input to Timestamp

Is the result of you passing a series to the pd.date_range function when it is expecting a Timestamp

Instead, use apply.

However, I still find it tricky to get lists into specific cells of dataframes. The way I use is to use a pd.Series([mylist]). Notice it is a list of a list. If it were just pd.Series(mylist) pandas would convert the list into a series and you'd get a series of series which is a dataframe.

try:

def fnl(x):
    l = pd.date_range(x.loc['first_date'], x.loc['last_date'], freq='B')
    return pd.Series([l])

df['range'] = df.apply(fnl, axis=1)
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • Thanks! So how is it that you got around the requirement for a time stamp by using apply on the data frame and taking a series of a list of a list? – Dance Party Jun 06 '16 at 08:15
  • 1
    you were doing `date_range(timeseries1, timeseries2)` when it should have been `date_range(timestamp1, timestampe2)`. By using `apply(axis=1)`, we go row by row, grabbing the timestamps for that row and getting the date range for that row. The series/listOlist thing is just a trick to get the list into a single dataframe cell. We could've done other things as well. – piRSquared Jun 06 '16 at 08:20
  • Thanks again. Follow-up question posted here: http://stackoverflow.com/questions/37653493/pandas-remove-elements-from-datetimeindex-per-list-elements-in-range – Dance Party Jun 06 '16 at 09:08