2

I'm on the last step of finishing my table, but my count is returning NaN. I'm trying to create a column target_days which groups by each value in 'month_start_date' and returns the value of the count of workday if True.

For example, each date that has a month_start_date of 2018-08-01 should return a value of 23 because there are 23 workdays in that month. Any help is appreciated!

import pandas as pd
import numpy as np
import datetime as dt
from pandas.tseries.holiday import USFederalHolidayCalendar as holidaylist

datelist = pd.date_range(start='2018-08-01', end=dt.datetime.today())
weekdaylist = pd.date_range(start='2018-08-01', end=dt.datetime.today(), freq='B')
mstartlist = (datelist + dt.timedelta(1)) + pd.tseries.offsets.MonthBegin(n=-1)

hl = holidaylist()
holidays = hl.holidays(start=datelist.min(), end=datelist.max())

df = pd.DataFrame(datelist, columns = ['date'])

df['weekday'] = df.date.isin(weekdaylist)

df['holiday'] = df.date.isin(holidays)

df['workday'] = np.where((df.weekday == False) | (df.holiday == True), False, True)

df['month_start_date'] = mstartlist

df['target_days'] = df.groupby('month_start_date')['workday'].apply(lambda x: x[x == True].count())

          date  weekday  holiday  workday month_start_date  target_days
0   2018-08-01     True    False     True       2018-08-01          NaN
1   2018-08-02     True    False     True       2018-08-01          NaN
2   2018-08-03     True    False     True       2018-08-01          NaN
3   2018-08-04    False    False    False       2018-08-01          NaN
4   2018-08-05    False    False    False       2018-08-01          NaN
  • 1
    How about df['target_days'] = df.groupby('month_start_date')['workday'].value_count()` https://stackoverflow.com/questions/53550988/count-occurrences-of-false-or-true-in-a-column-in-pandas – PeptideWitch Jan 14 '21 at 00:39
  • I got an error "'SeriesGroupBy' object has no attribute 'value_count'" but then changed from value_count to value_counts, thinking it would resolve the issue, but then resulted in another error: incompatible index of inserted column with frame index – FormidableData Jan 14 '21 at 00:56
  • you have to change the index of your df and merge on the groupby df.index = df["date"] pd.concat([df, df_group], axis=1, join="outer") – Quantum Dreamer Jan 14 '21 at 00:58

2 Answers2

2

Solution with groupby() and .transform(). Based on the response in this thread,

"As a general rule when using groupby(), if you use the .transform() 
function pandas will return a table with the same length as your 
original. When you use other functions like .sum() or .first() then 
pandas will return a table where each row is a group."

With that as a way to move forward, I was able to get the right solution.

df['target_days'] = df.groupby('month_start_date')['workday'].transform('sum')

The output of this is correct. The sample for month 2021-01-01 shows:

          date  weekday  holiday  workday month_start_date  target_days
884 2021-01-01     True     True    False       2021-01-01            8
885 2021-01-02    False    False    False       2021-01-01            8
886 2021-01-03    False    False    False       2021-01-01            8
887 2021-01-04     True    False     True       2021-01-01            8
888 2021-01-05     True    False     True       2021-01-01            8
889 2021-01-06     True    False     True       2021-01-01            8
890 2021-01-07     True    False     True       2021-01-01            8
891 2021-01-08     True    False     True       2021-01-01            8
892 2021-01-09    False    False    False       2021-01-01            8
893 2021-01-10    False    False    False       2021-01-01            8
894 2021-01-11     True    False     True       2021-01-01            8
895 2021-01-12     True    False     True       2021-01-01            8
896 2021-01-13     True    False     True       2021-01-01            8

The first 100 rows have correct value count for holiday = True:

          date  weekday  holiday  workday month_start_date  target_days
0   2018-08-01     True    False     True       2018-08-01           23
1   2018-08-02     True    False     True       2018-08-01           23
2   2018-08-03     True    False     True       2018-08-01           23
3   2018-08-04    False    False    False       2018-08-01           23
4   2018-08-05    False    False    False       2018-08-01           23
5   2018-08-06     True    False     True       2018-08-01           23
6   2018-08-07     True    False     True       2018-08-01           23
7   2018-08-08     True    False     True       2018-08-01           23
8   2018-08-09     True    False     True       2018-08-01           23
9   2018-08-10     True    False     True       2018-08-01           23
10  2018-08-11    False    False    False       2018-08-01           23
11  2018-08-12    False    False    False       2018-08-01           23
12  2018-08-13     True    False     True       2018-08-01           23
13  2018-08-14     True    False     True       2018-08-01           23
14  2018-08-15     True    False     True       2018-08-01           23
15  2018-08-16     True    False     True       2018-08-01           23
16  2018-08-17     True    False     True       2018-08-01           23
17  2018-08-18    False    False    False       2018-08-01           23
18  2018-08-19    False    False    False       2018-08-01           23
19  2018-08-20     True    False     True       2018-08-01           23
20  2018-08-21     True    False     True       2018-08-01           23
21  2018-08-22     True    False     True       2018-08-01           23
22  2018-08-23     True    False     True       2018-08-01           23
23  2018-08-24     True    False     True       2018-08-01           23
24  2018-08-25    False    False    False       2018-08-01           23
25  2018-08-26    False    False    False       2018-08-01           23
26  2018-08-27     True    False     True       2018-08-01           23
27  2018-08-28     True    False     True       2018-08-01           23
28  2018-08-29     True    False     True       2018-08-01           23
29  2018-08-30     True    False     True       2018-08-01           23
30  2018-08-31     True    False     True       2018-08-01           23
31  2018-09-01    False    False    False       2018-09-01           19
32  2018-09-02    False    False    False       2018-09-01           19
33  2018-09-03     True     True    False       2018-09-01           19
34  2018-09-04     True    False     True       2018-09-01           19
35  2018-09-05     True    False     True       2018-09-01           19
36  2018-09-06     True    False     True       2018-09-01           19
37  2018-09-07     True    False     True       2018-09-01           19
38  2018-09-08    False    False    False       2018-09-01           19
39  2018-09-09    False    False    False       2018-09-01           19
40  2018-09-10     True    False     True       2018-09-01           19
41  2018-09-11     True    False     True       2018-09-01           19
42  2018-09-12     True    False     True       2018-09-01           19
43  2018-09-13     True    False     True       2018-09-01           19
44  2018-09-14     True    False     True       2018-09-01           19
45  2018-09-15    False    False    False       2018-09-01           19
46  2018-09-16    False    False    False       2018-09-01           19
47  2018-09-17     True    False     True       2018-09-01           19
48  2018-09-18     True    False     True       2018-09-01           19
49  2018-09-19     True    False     True       2018-09-01           19
50  2018-09-20     True    False     True       2018-09-01           19
51  2018-09-21     True    False     True       2018-09-01           19
52  2018-09-22    False    False    False       2018-09-01           19
53  2018-09-23    False    False    False       2018-09-01           19
54  2018-09-24     True    False     True       2018-09-01           19
55  2018-09-25     True    False     True       2018-09-01           19
56  2018-09-26     True    False     True       2018-09-01           19
57  2018-09-27     True    False     True       2018-09-01           19
58  2018-09-28     True    False     True       2018-09-01           19
59  2018-09-29    False    False    False       2018-09-01           19
60  2018-09-30    False    False    False       2018-09-01           19
61  2018-10-01     True    False     True       2018-10-01           22
62  2018-10-02     True    False     True       2018-10-01           22
63  2018-10-03     True    False     True       2018-10-01           22
64  2018-10-04     True    False     True       2018-10-01           22
65  2018-10-05     True    False     True       2018-10-01           22
66  2018-10-06    False    False    False       2018-10-01           22
67  2018-10-07    False    False    False       2018-10-01           22
68  2018-10-08     True     True    False       2018-10-01           22
69  2018-10-09     True    False     True       2018-10-01           22
70  2018-10-10     True    False     True       2018-10-01           22
71  2018-10-11     True    False     True       2018-10-01           22
72  2018-10-12     True    False     True       2018-10-01           22
73  2018-10-13    False    False    False       2018-10-01           22
74  2018-10-14    False    False    False       2018-10-01           22
75  2018-10-15     True    False     True       2018-10-01           22
76  2018-10-16     True    False     True       2018-10-01           22
77  2018-10-17     True    False     True       2018-10-01           22
78  2018-10-18     True    False     True       2018-10-01           22
79  2018-10-19     True    False     True       2018-10-01           22
80  2018-10-20    False    False    False       2018-10-01           22
81  2018-10-21    False    False    False       2018-10-01           22
82  2018-10-22     True    False     True       2018-10-01           22
83  2018-10-23     True    False     True       2018-10-01           22
84  2018-10-24     True    False     True       2018-10-01           22
85  2018-10-25     True    False     True       2018-10-01           22
86  2018-10-26     True    False     True       2018-10-01           22
87  2018-10-27    False    False    False       2018-10-01           22
88  2018-10-28    False    False    False       2018-10-01           22
89  2018-10-29     True    False     True       2018-10-01           22
90  2018-10-30     True    False     True       2018-10-01           22
91  2018-10-31     True    False     True       2018-10-01           22
Joe Ferndz
  • 8,417
  • 2
  • 13
  • 33
  • 1
    While this answer is correct, I am getting some challenges when I test this on python.org. I dont have my personal laptop to test. I will do some more tests tonight and let you know if there are any changes to be done. Look at results for `2021-01-01` The answer is 8 while there are 9 True statements. Also the answer is 23 for False while there are only 5 False. This is the answer I get on the virtual vm. I will test it on my personal laptop later tonight – Joe Ferndz Jan 14 '21 at 01:19
  • Almost there - it looks like the target days value is being placed into every row, instead of only if that row's workday = True, each month_start_date should have a different value. It looks like it's replicating the August 2018 results for all rows. Also, I'm using [0] because there should be 23 workdays for the month instead of 8 workdays. – FormidableData Jan 14 '21 at 01:26
1

You are almost there...

df.groupby('month_start_date')['workday'].apply(lambda x: x[x == True].count())

returns the count for unique month_start_date which has a size of 30

2018-08-01    23
2018-09-01    19
2018-10-01    22
2018-11-01    20
2018-12-01    20
2019-01-01    21
2019-02-01    19
2019-03-01    21
2019-04-01    22
2019-05-01    22
2019-06-01    20
2019-07-01    22
2019-08-01    22
2019-09-01    20
2019-10-01    22
2019-11-01    19
2019-12-01    21
2020-01-01    21
2020-02-01    19
2020-03-01    22
2020-04-01    22
2020-05-01    20
2020-06-01    22
2020-07-01    22
2020-08-01    21
2020-09-01    21
2020-10-01    21
2020-11-01    19
2020-12-01    22
2021-01-01     9
Name: workday, dtype: int64

We need to merge these values basis the dates in the above obtained Series object with the df['date']


df['target_days'] = pd.merge(df[['date']], workdays_ss, how='left', left_on='date', right_index=True)['workday']

df

Output:

0   2018-08-01  True    False   True    2018-08-01  23.0
1   2018-08-02  True    False   True    2018-08-01  NaN
2   2018-08-03  True    False   True    2018-08-01  NaN
3   2018-08-04  False   False   False   2018-08-01  NaN
4   2018-08-05  False   False   False   2018-08-01  NaN
... ... ... ... ... ... ...
893 2021-01-10  False   False   False   2021-01-01  NaN
894 2021-01-11  True    False   True    2021-01-01  NaN
895 2021-01-12  True    False   True    2021-01-01  NaN
896 2021-01-13  True    False   True    2021-01-01  NaN
897 2021-01-14  True    False   True    2021-01-01  NaN

It is showing NaN in target_days because it matches with the first day of every month (Try running df[df['target_days'].notnull()]). You can tweak the logic if you want target_date populated for all the dates of a month.