3

I have a pandas dataframe with a lot of columns, some of which have values on weekends.

I'm now trying to remove all weekend rows, but need to add the values I remove to the respective following Monday.

Thu: 4
Fri: 5
Sat: 2
Sun: 1
Mon: 4
Tue: 3

needs to become

Thu: 4
Fri: 5
Mon: 7
Tue: 3

I have figured out how to slice only the weekdays (using df.index.dayofweek), but can't think of a clever way to aggregate before doing so.

Here's some dummy code to start:

index = pd.date_range(datetime.datetime.now().date() -
                      datetime.timedelta(20),
                      periods = 20,
                      freq = 'D')

df = pd.DataFrame({
    'Val_1': np.random.rand(20),
    'Val_2': np.random.rand(20),
    'Val_3': np.random.rand(20)
  },
  index = index)

df['Weekday'] = df.index.dayofweek

Any help on this would be much appreciated!

Oli
  • 317
  • 1
  • 2
  • 9

2 Answers2

3

Setup

I included a random seed

np.random.seed([3, 1415])

index = pd.date_range(datetime.datetime.now().date() -
                      datetime.timedelta(20),
                      periods = 20,
                      freq = 'D')

df = pd.DataFrame({
    'Val_1': np.random.rand(20),
    'Val_2': np.random.rand(20),
    'Val_3': np.random.rand(20)
  },
  index = index)

df['day_name'] = df.index.day_name()

df.head(6)

               Val_1     Val_2     Val_3   day_name
2018-07-18  0.444939  0.278735  0.651676  Wednesday
2018-07-19  0.407554  0.609862  0.136097   Thursday
2018-07-20  0.460148  0.085823  0.544838     Friday
2018-07-21  0.465239  0.836997  0.035073   Saturday
2018-07-22  0.462691  0.739635  0.275079     Sunday
2018-07-23  0.016545  0.866059  0.706685     Monday

Solution

I fill in a series of dates with the subsequent Monday for Saturdays and Sundays. That gets used in a group by operation.

weekdays = df.index.to_series().mask(df.index.dayofweek >= 5).bfill()

d_ = df.groupby(weekdays).sum()
d_

               Val_1     Val_2     Val_3
2018-07-18  0.444939  0.278735  0.651676
2018-07-19  0.407554  0.609862  0.136097
2018-07-20  0.460148  0.085823  0.544838
2018-07-23  0.944475  2.442691  1.016837
2018-07-24  0.850445  0.691271  0.713614
2018-07-25  0.817744  0.377185  0.776050
2018-07-26  0.777962  0.225146  0.542329
2018-07-27  0.757983  0.435280  0.836541
2018-07-30  2.645824  2.198333  1.375860
2018-07-31  0.926879  0.018688  0.746060
2018-08-01  0.721535  0.700566  0.373741
2018-08-02  0.117642  0.900749  0.603536
2018-08-03  0.145906  0.764869  0.775801
2018-08-06  0.738110  1.580137  1.266593

Compare

df.join(d_, rsuffix='_')

               Val_1     Val_2     Val_3   day_name    Val_1_    Val_2_    Val_3_
2018-07-18  0.444939  0.278735  0.651676  Wednesday  0.444939  0.278735  0.651676
2018-07-19  0.407554  0.609862  0.136097   Thursday  0.407554  0.609862  0.136097
2018-07-20  0.460148  0.085823  0.544838     Friday  0.460148  0.085823  0.544838
2018-07-21  0.465239  0.836997  0.035073   Saturday       NaN       NaN       NaN
2018-07-22  0.462691  0.739635  0.275079     Sunday       NaN       NaN       NaN
2018-07-23  0.016545  0.866059  0.706685     Monday  0.944475  2.442691  1.016837
2018-07-24  0.850445  0.691271  0.713614    Tuesday  0.850445  0.691271  0.713614
2018-07-25  0.817744  0.377185  0.776050  Wednesday  0.817744  0.377185  0.776050
2018-07-26  0.777962  0.225146  0.542329   Thursday  0.777962  0.225146  0.542329
2018-07-27  0.757983  0.435280  0.836541     Friday  0.757983  0.435280  0.836541
2018-07-28  0.934829  0.700900  0.538186   Saturday       NaN       NaN       NaN
2018-07-29  0.831104  0.700946  0.185523     Sunday       NaN       NaN       NaN
2018-07-30  0.879891  0.796487  0.652151     Monday  2.645824  2.198333  1.375860
2018-07-31  0.926879  0.018688  0.746060    Tuesday  0.926879  0.018688  0.746060
2018-08-01  0.721535  0.700566  0.373741  Wednesday  0.721535  0.700566  0.373741
2018-08-02  0.117642  0.900749  0.603536   Thursday  0.117642  0.900749  0.603536
2018-08-03  0.145906  0.764869  0.775801     Friday  0.145906  0.764869  0.775801
2018-08-04  0.199844  0.253200  0.091238   Saturday       NaN       NaN       NaN
2018-08-05  0.437564  0.548054  0.504035     Sunday       NaN       NaN       NaN
2018-08-06  0.100702  0.778883  0.671320     Monday  0.738110  1.580137  1.266593
piRSquared
  • 285,575
  • 57
  • 475
  • 624
1

Setup data using a simple series so that the weekend roll value is obvious:

index = pd.date_range(start='2018-07-18', periods = 20, freq = 'D')

df = pd.DataFrame({
    'Val_1': [1] * 20,
    'Val_2': [2] * 20,
    'Val_3': [3] * 20,
  },
  index = index)

You can take the cumulative sum of the relevant columns in your dataframe, and then difference the results using a weekday boolean filter. You need to apply some special logic to correctly account for the first day(s) depending on whether it is a weekday, a Saturday or a Sunday.

The correct roll behavior can be observed using an index start date of July 21st (Saturday) and the 22nd (Sunday).

In addition, you may need to account for the situation where the last day or two falls on a weekend. As is, those values would be lost. Depending on the situation, you may wish to roll them forwards to the following Monday (in which case you would need to extend your index) or else roll them back to the preceding Friday.

weekdays = df.index.dayofweek < 5

df2 = df.iloc[:, :].cumsum()[weekdays].diff()
if weekdays[0]:
    # First day is a weekday, so just use its value.
    df2.iloc[0, :] = df.iloc[0, :]
elif weekdays[1]:
    # First day must be a Sunday.
    df2.iloc[0, :] = df.iloc[0:2, :].sum()
else:
    # First day must be a Saturday.
    df2.iloc[0, :] = df.iloc[0:3, :].sum()

>>> df2.head(14)
            Val_1  Val_2  Val_3
2018-07-18      1      2      3
2018-07-19      1      2      3
2018-07-20      1      2      3
2018-07-23      3      6      9
2018-07-24      1      2      3
2018-07-25      1      2      3
2018-07-26      1      2      3
2018-07-27      1      2      3
2018-07-30      3      6      9
2018-07-31      1      2      3
2018-08-01      1      2      3
2018-08-02      1      2      3
2018-08-03      1      2      3
2018-08-06      3      6      9
Alexander
  • 105,104
  • 32
  • 201
  • 196