1

In a data frame, I wish to drop all the rows with dates which do not fall on Last Friday of the Month.

But if data does not exist on a Friday (being a holiday) then retain row from a day before i.e. Thursday.

If Friday and Thursday both are holidays (no data) then retain Wednesday and so on.

Basically if Friday is a holiday go a day back to Thursday, if Thursday is a holiday go back to Wednesday and so on.

Data:

[['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Turnover', 'weekday', 'weekday_name'], ['30-10-2020', 0.5, 0.4, 0.89, 0.21, 0.09, 0.82, 4, 'Friday'], ['31-10-2020', 0.49, 0.58, 0.24, 0.81, 0.76, 0.86, 5, 'Saturday'], ['02-11-2020', 0.21, 0.31, 0.92, 0.7, 0.07, 0.45, 0, 'Monday'], ['03-11-2020', 0.2, 0.09, 0.25, 0.58, 0.61, 0.36, 1, 'Tuesday'], ['04-11-2020', 0.93, 0.61, 0.89, 0.31, 0.01, 0.04, 2, 'Wednesday'], ['05-11-2020', 0.14, 0.52, 0.79, 0.44, 0.46, 0.15, 3, 'Thursday'], ['18-11-2020', 0.75, 0.36, 0.41, 0.73, 0.48, 0.83, 2, 'Wednesday'], ['19-11-2020', 0.08, 0.86, 0.56, 0.13, 0.15, 0.28, 3, 'Thursday'], ['20-11-2020', 0.99, 0.23, 0.47, 0.87, 0.51, 0.59, 4, 'Friday'], ['23-11-2020', 0.84, 0.67, 0.83, 0.3, 0.34, 0.98, 0, 'Monday'], ['24-11-2020', 0.46, 0.49, 0.1, 0.36, 0.07, 0.6, 1, 'Tuesday'], ['29-11-2020', 0.45, 0.45, 0.98, 0.93, 0.73, 0.08, 6, 'Sunday'], ['01-12-2020', 0.17, 0.89, 0.63, 0.05, 0.04, 0.57, 1, 'Tuesday'], ['02-12-2020', 0.17, 0.22, 0.71, 0.72, 0.13, 0.69, 2, 'Wednesday'], ['03-12-2020', 0.45, 0.69, 0.69, 0.41, 0.39, 0.1, 3, 'Thursday'], ['17-12-2020', 0.15, 0.99, 0.58, 0.42, 0.97, 0.74, 3, 'Thursday'], ['18-12-2020', 0.22, 0.89, 0.76, 0.25, 0.96, 0.56, 4, 'Friday'], ['21-12-2020', 0.92, 0.98, 0.1, 0.22, 0.01, 0.51, 0, 'Monday'], ['22-12-2020', 0.65, 0.42, 0.7, 0.81, 0.92, 0.64, 1, 'Tuesday'], ['23-12-2020', 0.8, 0.46, 0.91, 0.38, 0.15, 0.12, 2, 'Wednesday'], ['24-12-2020', 0.71, 0.77, 0.42, 0.42, 0.49, 0.49, 3, 'Thursday'], ['28-12-2020', 0.6, 0.79, 0.72, 0.0, 0.01, 0.54, 0, 'Monday'], ['29-12-2020', 0.1, 0.71, 0.06, 0.28, 0.43, 0.55, 1, 'Tuesday'], ['30-12-2020', 0.35, 0.53, 0.75, 0.64, 0.46, 0.27, 2, 'Wednesday'], ['31-12-2020', 0.58, 0.67, 0.4, 0.81, 0.34, 0.84, 3, 'Thursday'], ['11-01-2021', 0.31, 0.02, 0.67, 0.19, 0.8, 0.36, 0, 'Monday'], ['13-01-2021', 0.12, 0.22, 0.76, 0.03, 0.52, 0.54, 2, 'Wednesday'], ['15-01-2021', 0.95, 0.62, 0.41, 0.35, 0.76, 0.23, 4, 'Friday'], ['18-01-2021', 0.89, 0.03, 0.63, 0.6, 0.89, 0.89, 0, 'Monday'], ['19-01-2021', 0.01, 0.37, 0.26, 0.85, 0.31, 0.41, 1, 'Tuesday'], ['21-01-2021', 0.47, 0.03, 0.84, 0.7, 0.94, 0.31, 3, 'Thursday'], ['22-01-2021', 0.36, 0.41, 0.14, 0.82, 0.55, 0.78, 4, 'Friday'], ['25-01-2021', 0.42, 0.76, 0.0, 0.33, 0.22, 0.54, 0, 'Monday'], ['27-01-2021', 0.68, 0.93, 0.85, 0.74, 0.27, 0.99, 2, 'Wednesday'], ['30-01-2021', 0.07, 0.2, 0.8, 0.42, 0.75, 0.71, 5, 'Saturday'], ['01-02-2021', 0.19, 0.24, 0.78, 0.19, 0.55, 0.03, 0, 'Monday'], ['02-02-2021', 0.32, 0.49, 0.1, 0.29, 0.07, 0.67, 1, 'Tuesday'], ['04-02-2021', 0.6, 0.16, 0.28, 0.22, 0.43, 0.81, 3, 'Thursday'], ['05-02-2021', 0.78, 0.9, 0.72, 0.88, 0.6, 0.65, 4, 'Friday'], ['15-02-2021', 0.65, 0.98, 0.58, 0.14, 0.91, 0.88, 0, 'Monday'], ['17-02-2021', 0.82, 0.89, 0.45, 0.71, 0.7, 0.18, 2, 'Wednesday'], ['18-02-2021', 0.99, 0.79, 0.67, 0.52, 0.95, 0.01, 3, 'Thursday'], ['19-02-2021', 0.95, 0.6, 0.18, 0.01, 0.72, 0.72, 4, 'Friday'], ['22-02-2021', 0.73, 0.21, 0.88, 0.86, 0.54, 0.69, 0, 'Monday'], ['24-02-2021', 0.7, 0.38, 0.39, 0.52, 0.51, 0.13, 2, 'Wednesday'], ['25-02-2021', 0.17, 0.66, 0.73, 0.01, 0.38, 0.95, 3, 'Thursday'], ['26-02-2021', 0.03, 0.62, 0.48, 0.85, 0.36, 0.06, 4, 'Friday'], ['01-03-2021', 0.13, 0.38, 0.09, 0.37, 0.49, 0.83, 0, 'Monday'], ['02-03-2021', 0.07, 0.26, 0.44, 0.18, 0.88, 0.66, 1, 'Tuesday'], ['03-03-2021', 0.05, 0.0, 0.19, 0.59, 0.01, 0.4, 2, 'Wednesday'], ['10-03-2021', 0.27, 0.71, 0.24, 0.29, 0.16, 0.64, 2, 'Wednesday'], ['15-03-2021', 0.09, 0.1, 0.38, 0.54, 0.42, 0.38, 0, 'Monday'], ['16-03-2021', 0.87, 0.13, 0.54, 0.49, 0.49, 0.83, 1, 'Tuesday'], ['17-03-2021', 0.95, 0.33, 0.66, 0.97, 0.11, 0.91, 2, 'Wednesday'], ['18-03-2021', 0.26, 0.57, 0.52, 0.94, 0.44, 0.3, 3, 'Thursday'], ['23-03-2021', 0.76, 0.83, 0.04, 0.43, 0.44, 0.98, 1, 'Tuesday'], ['24-03-2021', 0.51, 0.79, 0.18, 0.26, 0.98, 0.27, 2, 'Wednesday'], ['25-03-2021', 0.3, 0.12, 0.17, 0.76, 0.14, 0.63, 3, 'Thursday'], ['30-03-2021', 0.77, 0.42, 0.1, 0.87, 0.46, 0.82, 1, 'Tuesday'], ['31-03-2021', 0.76, 0.16, 0.0, 0.38, 0.07, 0.69, 2, 'Wednesday'], ['01-04-2021', 0.43, 0.78, 0.09, 0.91, 0.21, 0.29, 3, 'Thursday'], ['05-04-2021', 0.63, 0.48, 0.61, 0.27, 0.33, 0.93, 0, 'Monday'], ['13-04-2021', 0.07, 0.94, 0.89, 0.8, 0.79, 0.68, 1, 'Tuesday'], ['20-04-2021', 0.54, 0.61, 0.11, 0.01, 0.74, 0.96, 1, 'Tuesday'], ['23-04-2021', 0.8, 0.56, 0.97, 0.68, 0.13, 0.06, 4, 'Friday'], ['26-04-2021', 0.64, 0.02, 0.45, 0.87, 0.08, 0.75, 0, 'Monday'], ['27-04-2021', 0.04, 0.91, 0.42, 0.61, 0.38, 0.95, 1, 'Tuesday'], ['28-04-2021', 0.61, 0.32, 0.43, 0.37, 0.53, 0.24, 2, 'Wednesday'], ['29-04-2021', 0.44, 0.26, 0.65, 0.39, 0.44, 0.9, 3, 'Thursday'], ['30-04-2021', 0.18, 0.89, 0.2, 0.33, 0.99, 0.15, 4, 'Friday']]
Dan
  • 77
  • 7

1 Answers1

1

Using holidays, you can adjust recursively to achieve dates you want. It's should be efficient as it is vectorised.

import pandas as pd
import numpy as np
from pandas.tseries.holiday import USFederalHolidayCalendar

df = pd.DataFrame({"date":pd.date_range("1-jan-2020", "31-dec-2021")})

# select holidays you want...
holidays = USFederalHolidayCalendar().holidays(start=df["date"].min(), end=df["date"].max())

# get last Friday of every month
df = df.loc[df["date"].dt.dayofweek.eq(4)].groupby([df["date"].dt.month, df["date"].dt.year], as_index=False).max()

# use recursion to adjust dates if instance of a holiday or weekend
def adjusthol(df, holidays):
    if df["date"].isin(holidays).any() or df["date"].dt.dayofweek.isin([5,6]).any():
        df["date"] = np.where(df["date"].isin(holidays), df["date"] - pd.Timedelta(days=1), df["date"])
        df = adjusthol(df, holidays)
    return df


adjusthol(df, holidays)

output

      date
2020-01-31
2021-01-29
2020-02-28
2021-02-26
2020-03-27
2021-03-26
2020-04-24
2021-04-30
2020-05-29
2021-05-28
2020-06-26
2021-06-25
2020-07-31
2021-07-30
2020-08-28
2021-08-27
2020-09-25
2021-09-24
2020-10-30
2021-10-29
2020-11-27
2021-11-26
2020-12-24
2021-12-30

Rob Raymond
  • 29,118
  • 3
  • 14
  • 30
  • Last Friday falls on 25th Dec 2020 which is a holiday so it should return 24th Dec 2020 Thursday) instead of 11 2020-12-31 00:00:00 Thursday – Dan May 14 '21 at 09:30
  • I've re-structured answer as a recursive one, so it achieves you complete requirement. Have not tested if you land on a weekend as have no test cases... – Rob Raymond May 15 '21 at 17:12