3

Made my own definition of MLK Day Holiday that adheres not to when the holiday was first observed, but by when it was first observed by the NYSE. The NYSE first observed MLK day in January of 1998.

When asking the Holiday for the days in which the holiday occurred between dates, it works fine for the most part, returning an empty set when the MLK date is not in the range requested, and returning the appropriate date when it is. For date ranges that precede the start_date of the holiday, it appropriately returns the empty set, until we hit around 1995, and then it fails. I cannot figure out why it fails then and not in other situations when the empty set is the correct answer.

Note: Still stuck on Pandas 0.22.0. Python3

import pandas as pd
from datetime import datetime
from dateutil.relativedelta import MO
from pandas.tseries.holiday import Holiday

__author__ = 'eb'

mlk_rule = Holiday('MLK Day (NYSE Observed)',
                   start_date=datetime(1998, 1, 1), month=1, day=1,
                   offset=pd.DateOffset(weekday=MO(3)))

start = pd.to_datetime('1999-01-17')
end = pd.to_datetime('1999-05-01')
finish = pd.to_datetime('1980-01-01')
while start > finish:
    print(f"{start} - {end}:")
    try:
        dates = mlk_rule.dates(start, end, return_name=True)
    except Exception as e:
        print("\t****** Fail *******")
        print(f"\t{e}")
        break
    print(f"\t{dates}")
    start = start - pd.DateOffset(years=1)
    end = end - pd.DateOffset(years=1)

When run, this results in:

1999-01-17 00:00:00 - 1999-05-01 00:00:00:
    1999-01-18    MLK Day (NYSE Observed)
Freq: 52W-MON, dtype: object
1998-01-17 00:00:00 - 1998-05-01 00:00:00:
    1998-01-19    MLK Day (NYSE Observed)
Freq: 52W-MON, dtype: object
1997-01-17 00:00:00 - 1997-05-01 00:00:00:
    Series([], dtype: object)
1996-01-17 00:00:00 - 1996-05-01 00:00:00:
    Series([], dtype: object)
1995-01-17 00:00:00 - 1995-05-01 00:00:00:
    ****** Fail *******
    Must provide freq argument if no data is supplied

What happens in 1995 that causes it to fail, that does not happen in the same periods in the years before?

lczapski
  • 4,026
  • 3
  • 16
  • 32
ebergerson
  • 369
  • 2
  • 6

1 Answers1

0

ANSWER: Inside of the Holiday class, the dates() method is used to gather the list of valid holidays within a requested date range. In order to insure that this occurs properly, the implementation gathers all holidays from one year before to one year after the requested date range via the internal _reference_dates() method. In this method, if the receiving Holiday instance has an internal start or end date, it uses that date as the begin or end of the range to be examined rather than the passed in requested range, even if the dates in the requested range precede or exceed the start or end date of the rule.

The existing implementation mistakenly assumes it is ok to limit the effective range over which it must accurately identify what holidays are in existence to the range over which holidays exist. As part of a set of rules in a calendar, it is as important for a Holiday to identify where holidays do not exist as where they do. The NULL set response is an important function of the Holiday class.

For example, in a Trading Day Calendar that needs to identify when financial markets are open or closed, the calendar may need to accurately identify which days the market is closed over a 100 year history. The market only closed for MLK day for a small part of that history. A calendar that includes the MLK holiday as constructed above throws an error when asked for the open days or holidays for periods preceding the MLK start_date[1].

To fix this, I re-implemented the _reference_dates() method in a custom sub-class of Holiday to insure that when the requested date range extends before the start_date or after the end_date of the holiday rule, it uses the actual requested range to build the reference dates from, rather than bound the request by the internal start and end dates.

Here is the implementation I am using.

class MLKHoliday(Holiday):

def __init__(self):
    super().__init__('MLK Day (NYSE Observed)',
                     start_date=datetime(1998, 1, 1), month=1, day=1,
                     offset=pd.DateOffset(weekday=MO(3)))

def _reference_dates(self, start_date, end_date):
    """
    Get reference dates for the holiday.

    Return reference dates for the holiday also returning the year
    prior to the start_date and year following the end_date.  This ensures
    that any offsets to be applied will yield the holidays within
    the passed in dates.
    """
    if self.start_date and start_date and start_date >= self.start_date:
        start_date = self.start_date.tz_localize(start_date.tz)

    if self.end_date and end_date and end_date <= self.end_date:
        end_date = self.end_date.tz_localize(end_date.tz)

    year_offset = pd.DateOffset(years=1)
    reference_start_date = pd.Timestamp(
        datetime(start_date.year - 1, self.month, self.day))

    reference_end_date = pd.Timestamp(
        datetime(end_date.year + 1, self.month, self.day))
    # Don't process unnecessary holidays
    dates = pd.DatetimeIndex(start=reference_start_date,
                             end=reference_end_date,
                             freq=year_offset, tz=start_date.tz)
    return dates

Does anyone know if this has been fixed in a more up-to-date version of pandas?

[1] Note: As constructed in the original question, the mlk_rule will not actually fail to provide the NULL set to the dates() call over a range just preceding the start_date but will actually start throwing exceptions a year or so before that. This is because the mistaken assumption about the lack of need for a proper NULL set response is mitigated by the extension of the date range by a year in each direction by _reference_dates().

ebergerson
  • 369
  • 2
  • 6