0

I have a dataframe with a double index (day, time) and would like to create a new column 'Holiday' equal to one if the index day belongs to a list of holidays.

My list of holidays of type DatetimeIndex:

holidays = ['2017-09-11', '2017-12-24']

My original dataframe:

                       Visitor  
Date       Time                                                              
2017-09-11 4:45           0         
           5:00           1        
           5:15          26       
....
2017-09-12 4:45           0       
           5:00           1         
           5:15          26     
....

What I would like to have:

                       Visitor      Holiday  
Date       Time                                                              
2017-09-11 4:45           0           1         
           5:00           1           1         
           5:15          26           1         
....
2017-09-12 4:45           0           0         
           5:00           1           0         
           5:15          26           0        
....

Here is what I tried based on this previous answer:

df['Holiday'] = int(df.index.get_level_values(0) in holidays == True)

However my column 'Holiday' always has the value 0...

Thanks in advance!

cs95
  • 379,657
  • 97
  • 704
  • 746
Batmax
  • 253
  • 8
  • 17

2 Answers2

2

Your current solution should actually throw a ValueError:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Note that the in operator is meant for scalar values and is not meant to be used with pandas dataframes. pandas instead has a lot of conditional and bitwise operators that are overloaded, as well as a comprehensive suite of APIs for the purpose of conditional and boolean logic.


You can use np.where and isin.

df['Holiday'] = np.where(df.index.get_level_values(0).isin(holidays), 1, 0)
df


                 Visitor  Holiday
Date       Time                  
2017-09-11 4:45        0        1
           5:00        1        1
           5:15       26        1
2017-09-12 4:45        0        0
           5:00        1        0
           5:15       26        0
cs95
  • 379,657
  • 97
  • 704
  • 746
2

Use isin by taking the date level from get_level_values and use astype(int) to convert boolean to integer.

In [192]: df['Holiday'] = df.index.get_level_values(0).isin(holidays).astype(int)

In [193]: df
Out[193]:
                 Visitor  Holiday
Date       Time
2017-09-11 4:45        0        1
           5:00        1        1
           5:15       26        1
2017-09-12 4:45        0        0
           5:00        1        0
           5:15       26        0

If you want a copy instead of modifying df

In [196]: df.assign(Holiday=df.index.get_level_values(0).isin(holidays).astype(int))
Out[196]:
                 Visitor  Holiday
Date       Time
2017-09-11 4:45        0        1
           5:00        1        1
           5:15       26        1
2017-09-12 4:45        0        0
           5:00        1        0
           5:15       26        0
Zero
  • 74,117
  • 18
  • 147
  • 154
  • You're lucky it's me and not someone else with an identical solution to yours who posted literally seconds before you did. ;-) – cs95 Sep 15 '17 at 11:47
  • Hmm, somewhere I feel np.where is quite not same as astype(int), and obviously we both were independently at the `isin` part, what was it <30secs? – Zero Sep 15 '17 at 11:49
  • @Batmax You can upvote both solutions, they're essentially the same. – cs95 Sep 15 '17 at 11:50