14

I have a dataframe which has Dates and public holidays

Date    WeekNum Public_Holiday
1/1/2015    1   1
2/1/2015    1   0
3/1/2015    1   0
4/1/2015    1   0
5/1/2015    1   0
6/1/2015    1   0
7/1/2015    1   0
8/1/2015    2   0
9/1/2015    2   0
10/1/2015   2   0
11/1/2015   2   0
12/1/2015   2   0
13/1/2015   2   0

I have to create a conditional column named Public_Holiday_Week, which should return 1, if that particular week has a public holiday

And I want to see an output like this

Date    WeekNum Public_Holiday  Public_Holiday_Week
1/1/2015    1   1               1
2/1/2015    1   0               1
3/1/2015    1   0               1
4/1/2015    1   0               1
5/1/2015    1   0               1
6/1/2015    1   0               1
7/1/2015    1   0               1
8/1/2015    2   0               0
9/1/2015    2   0               0
10/1/2015   2   0               0
11/1/2015   2   0               0
12/1/2015   2   0               0
13/1/2015   2   0               0

I tried using np.where

df['Public_Holiday_Week'] = np.where(df['Public_Holiday']==1,1,0)

But it applies 0 for other days of the week when it is not a public holiday.

Do I have to apply rolling here? Appreciate your help

cs95
  • 379,657
  • 97
  • 704
  • 746
Ahamed Moosa
  • 1,395
  • 7
  • 16
  • 30
  • First day of week is `2014-31-12` [link](http://whatweekisit.org/calendar-2015..html)? Why? It is necessary for your data? – jezrael Jun 25 '18 at 06:07
  • df['weekOfYear'] = df['ActivityDate'].dt.week #week of the year df['weekNum'] = df.weekOfYear + (df.year % 2015) * 52 – Ahamed Moosa Jun 25 '18 at 06:55
  • @jezrael This is how I came up with week of the year and week num – Ahamed Moosa Jun 25 '18 at 06:57
  • Thanks, it is exactly what is a bit weird - weeks by `df['weekOfYear'] = df['ActivityDate'].dt.week` and by column `WeekNum` in sample data are different. And it is reason why was editted my solution - `weeks` column is different like `WeekNum`. – jezrael Jun 25 '18 at 07:07
  • @jezrael Yes, apologies for not informing that before hand :) – Ahamed Moosa Jun 25 '18 at 07:09
  • No, not necessary inform me, only I write this comments if solutions are OK, what you need :) – jezrael Jun 25 '18 at 07:11

3 Answers3

8

For improve performance dont use groupby, rather get all WeekNum with at least one 1 and then select values by isin, last cast boolean mask to ints:

weeks = df.loc[df['Public_Holiday'].eq(1), 'WeekNum']
df['Public_Holiday_Week'] = df['WeekNum'].isin(weeks).astype(int)

print (df)
         Date  WeekNum  Public_Holiday  Public_Holiday_Week
0    1/1/2015        1               1                    1
1    2/1/2015        1               0                    1
2    3/1/2015        1               0                    1
3    4/1/2015        1               0                    1
4    5/1/2015        1               0                    1
5    6/1/2015        1               0                    1
6    7/1/2015        1               0                    1
7    8/1/2015        2               0                    0
8    9/1/2015        2               0                    0
9   10/1/2015        2               0                    0
10  11/1/2015        2               0                    0
11  12/1/2015        2               0                    0
12  13/1/2015        2               0                    0

As pointed @Mohamed Thasin ah if necessary is possible groupby by week, but then get different output, because different week numbers:

df['weeks'] = pd.to_datetime(df['Date'], dayfirst=True).dt.week

weeks = df.loc[df['Public_Holiday'].eq(1), 'weeks']
df['Public_Holiday_Week'] = df['weeks'].isin(weeks).astype(int)
print (df)
         Date  WeekNum  Public_Holiday  weeks  Public_Holiday_Week
0    1/1/2015        1               1      1                    1
1    2/1/2015        1               0      1                    1
2    3/1/2015        1               0      1                    1
3    4/1/2015        1               0      1                    1
4    5/1/2015        1               0      2                    0
5    6/1/2015        1               0      2                    0
6    7/1/2015        1               0      2                    0
7    8/1/2015        2               0      2                    0
8    9/1/2015        2               0      2                    0
9   10/1/2015        2               0      2                    0
10  11/1/2015        2               0      2                    0
11  12/1/2015        2               0      3                    0
12  13/1/2015        2               0      3                    0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
7

Use resample and skip the use of the WeekNum column altogether.

df.assign(
    Public_Holiday_Week=
    df.resample('W-Wed', on='Date').Public_Holiday.transform('max')
)

         Date  WeekNum  Public_Holiday  Public_Holiday_Week
0  2015-01-01        1               1                    1
1  2015-01-02        1               0                    1
2  2015-01-03        1               0                    1
3  2015-01-04        1               0                    1
4  2015-01-05        1               0                    1
5  2015-01-06        1               0                    1
6  2015-01-07        1               0                    1
7  2015-01-08        2               0                    0
8  2015-01-09        2               0                    0
9  2015-01-10        2               0                    0
10 2015-01-11        2               0                    0
11 2015-01-12        2               0                    0
12 2015-01-13        2               0                    0
piRSquared
  • 285,575
  • 57
  • 475
  • 624
4

groupby and max, with map:

df['Public_Holiday_Week'] = df.WeekNum.map(df.groupby('WeekNum').Public_Holiday.max())
print(df)
         Date  WeekNum  Public_Holiday  Public_Holiday_Week
0    1/1/2015        1               1                    1
1    2/1/2015        1               0                    1
2    3/1/2015        1               0                    1
3    4/1/2015        1               0                    1
4    5/1/2015        1               0                    1
5    6/1/2015        1               0                    1
6    7/1/2015        1               0                    1
7    8/1/2015        2               0                    0
8    9/1/2015        2               0                    0
9   10/1/2015        2               0                    0
10  11/1/2015        2               0                    0
11  12/1/2015        2               0                    0
12  13/1/2015        2               0                    0

groupby and transform, with max

df['Public_Holiday_Week'] = df.groupby('WeekNum').Public_Holiday.transform('max')

Thankfully, this will generalise nicely when grouping by month-year:

df['Public_Holiday_Week'] = (
     df.groupby(['WeekNum', df.Date.str.split('/', 1).str[1]])
      .Public_Holiday.transform('max')
)
print(df)
         Date  WeekNum  Public_Holiday  Public_Holiday_Week
0    1/1/2015        1               1                    1
1    2/1/2015        1               0                    1
2    3/1/2015        1               0                    1
3    4/1/2015        1               0                    1
4    5/1/2015        1               0                    1
5    6/1/2015        1               0                    1
6    7/1/2015        1               0                    1
7    8/1/2015        2               0                    0
8    9/1/2015        2               0                    0
9   10/1/2015        2               0                    0
10  11/1/2015        2               0                    0
11  12/1/2015        2               0                    0
12  13/1/2015        2               0                    0
Community
  • 1
  • 1
cs95
  • 379,657
  • 97
  • 704
  • 746