So I can do this by brute force, but it's painfully slow so I'm sure I'm missing something.
Let's say that I want to create a (daily) DatetimeIndex of fixed length, say 15 days, but there are some caveats:
- if the 15-day index ends on a weekend, then it actually ends on the last Friday in the index, and
- if the 15-day period contains censored dates, then the censored dates do not count towards the 15-day count. And the censored dates can run well past the 15 day period.
To elaborate on point 2, let's say I start on 2018-01-01, but 2018-01-12 to 2018-02-14 are censored, so my 15 day period could be (brute force approach):
possible = pd.date_range(start='2018-01-01', end='2018-12-31')
censored = pd.date_range(start='2018-01-12', end='2018-02-14')
bforce = pd.DatetimeIndex(set(possible)\
.difference(set(censored)))\
.sort_values()[:15]
idx = pd.DatetimeIndex([d for d in bforce if d.weekday() not in (5,6)])
which gives:
DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04',
'2018-01-05', '2018-01-08', '2018-01-09', '2018-01-10',
'2018-01-11', '2018-02-15', '2018-02-16'],
dtype='datetime64[ns]', freq=None)
which is correct. Note that I've disregarded all weekends in the final index, but did not include them in the censored dates values since that would push the 15-day period out by way of actually not counting weekends. This index counts weekends (but doesn't use them) and only has to end on the Friday before one if the calculated value lands on a weekend.
The above is, clearly, a mess. I'm hoping there is a cleaner way to do this, in particular avoid pre-building a longer index than I need to start with, as well as multiple intermediate list constructions?