2

I have a big DataFrame with millions of rows and one of the columns as date. I want to add 5 columns to it which are 0/1 for weekdays.

dates = pd.date_range('1700-01-01', '2017-07-02')
df = pd.DataFrame({'date':dates, 'Values':np.random.normal(size = len(dates))})
df

          date      value
0   1700-01-01  -1.239422
1   1700-01-02  -0.209840
2   1700-01-03   0.146293
3   1700-01-04   1.422454
4   1700-01-05   0.453222
...

I am trying to achieve this as follows:

df['isMonday'] = df.apply(lambda x: 1 if x['date'].weekday() == 0 else 0, axis=1)
df['isTuesday'] = df.apply(lambda x: 1 if x['date'].weekday() == 1 else 0, axis=1)
df['isWednesday'] = df.apply(lambda x: 1 if x['date'].weekday() == 2 else 0, axis=1)
df['isThursday'] = df.apply(lambda x: 1 if x['date'].weekday() == 3 else 0, axis=1)
df['isFriday'] = df.apply(lambda x: 1 if x['date'].weekday() == 4 else 0, axis=1)
df

          date      value  isMonday  isTuesday isWednesday isThursday isFriday
0   1700-01-01  -1.239422        0           0           0          0        1
1   1700-01-02  -0.209840        0           0           0          0        0
2   1700-01-03   0.146293        0           0           0          0        0
3   1700-01-04   1.422454        1           0           0          0        0
4   1700-01-05   0.453222        0           1           0          0        0
...

This is very slow. What would be the most efficient way to achieve this.

Gerry
  • 606
  • 6
  • 16

1 Answers1

2

Try the following:

df.join(pd.get_dummies("is"+df.date.dt.weekday_name))

Should yield:

#      Values       date  isFriday  isMonday  isSaturday  isSunday  isThursday  \
# 0  0.107486 1700-01-01         1         0           0         0           0   
# 1 -0.418628 1700-01-02         0         0           1         0           0   
# 2 -0.336569 1700-01-03         0         0           0         1           0   
# 3  0.322989 1700-01-04         0         1           0         0           0   
# 4 -1.025901 1700-01-05         0         0           0         0           0   
# 5 -0.297524 1700-01-06         0         0           0         0           0   
# 6  0.824080 1700-01-07         0         0           0         0           1   
# 7 -0.286943 1700-01-08         1         0           0         0           0   
# 8 -1.774486 1700-01-09         0         0           1         0           0   
# 9  1.029038 1700-01-10         0         0           0         1           0   

#    isTuesday  isWednesday  
# 0          0            0  
# 1          0            0  
# 2          0            0  
# 3          0            0  
# 4          1            0  
# 5          0            1  
# 6          0            0  
# 7          0            0  
# 8          0            0  
# 9          0            0 

What is happening here is that you're getting the week day names of your datetime values and prepending them with the string "is". Once that is done, you then apply the function pd.get_dummies to the result. pd.get_dummies generates a column for each value from the output of pd.get_dummies and fills matched rows with 1, while the unmatched rows are filled with 0. This will be a new dataframe that you can join to your original dataframe.

Edit:

version 0.16.2 solution:

weekday_names = {0: 'Monday',
                 1: 'Tuesday',
                 2: 'Wednesday',
                 3: 'Thursday',
                 4: 'Friday',
                 5: 'Saturday',
                 6: 'Sunday'}

df.join(pd.get_dummies("is"+df.date.dt.weekday.apply(lambda x: weekday_names.get(x))))

I hope this helps.

Abdou
  • 12,931
  • 4
  • 39
  • 42
  • Thanks again for your response @Abdou. This throws `AttributeError: 'DatetimeProperties' object has no attribute 'weekday_name'`. I am not sure if this again related to version `0.16.2` – Gerry Jul 03 '17 at 00:28
  • Thanks @Abdou. Very useful – Gerry Jul 03 '17 at 22:03