efficient way to compute pandas DatetimeIndex with censored dates

Question

So I can do this by brute force, but it's painfully slow so I'm sure I'm missing something.

Let's say that I want to create a (daily) DatetimeIndex of fixed length, say 15 days, but there are some caveats:

if the 15-day index ends on a weekend, then it actually ends on the last Friday in the index, and
if the 15-day period contains censored dates, then the censored dates do not count towards the 15-day count. And the censored dates can run well past the 15 day period.

To elaborate on point 2, let's say I start on 2018-01-01, but 2018-01-12 to 2018-02-14 are censored, so my 15 day period could be (brute force approach):

possible = pd.date_range(start='2018-01-01', end='2018-12-31')
censored = pd.date_range(start='2018-01-12', end='2018-02-14')

bforce = pd.DatetimeIndex(set(possible)\
            .difference(set(censored)))\
            .sort_values()[:15]

idx = pd.DatetimeIndex([d for d in bforce if d.weekday() not in (5,6)])

which gives:

DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04',
               '2018-01-05', '2018-01-08', '2018-01-09', '2018-01-10',
               '2018-01-11', '2018-02-15', '2018-02-16'],
              dtype='datetime64[ns]', freq=None)

which is correct. Note that I've disregarded all weekends in the final index, but did not include them in the censored dates values since that would push the 15-day period out by way of actually not counting weekends. This index counts weekends (but doesn't use them) and only has to end on the Friday before one if the calculated value lands on a weekend.

The above is, clearly, a mess. I'm hoping there is a cleaner way to do this, in particular avoid pre-building a longer index than I need to start with, as well as multiple intermediate list constructions?

Seems like you need business days and a custom holiday calendar for your censored dates. Have you looked at something like [market calendars](https://pandas-market-calendars.readthedocs.io/en/latest/index.html) — cfort, Jul 17 '18 at 01:56
Had not heard of that before, thanks! Not of exact use for this application owing to how my condition 2 works though; that module wants start and end dates as it simply wraps a bunch of (North American?) holidays for convenient filtering. — Matt, Jul 17 '18 at 03:09

efficient way to compute pandas DatetimeIndex with censored dates

0 Answers0