3

I analyze the running of devices hour per hour that work 24 hours a day (but not all the year), but I have error with:

from pandas.tseries.offsets import CustomBusinessHour

Use=CustomBusinessHour(
    start='00:00',
    end='24:00',
    weekmask=(1,1,1,1,1,1,0)
)

or

Use=CustomBusinessHour(
    start='00:00',
    end='00:00',
    weekmask=(1,1,1,1,1,1,0)
)

(or end='23:59' does strange things, it shifts a minute every day)
(or end='23:00' is not suitable, it stops at 22:00 instead 23:00)

Do you know if it's possible to create CustomBusinessHour that works the whole day ?

My goal is to generate a Series of hours worked in a year: pd.date_range(dt.date(2020,1,1), dt.date(2021,1,1), closed='left', freq=Use)

Thx a lot in advance.

dge
  • 41
  • 4

2 Answers2

0

The accepted answer will not give the intended results if the period between the start and end dates contains a day that is not a business day. This is because Series.asfreq() will fill in the missing non-business days.

The solution I have is to generate in two steps. Get the days we want, then for each day build an hourly series and join them together:

def custom_24h_day_date_range(start_date, end_date, n=1, holidays=None, weekmask="Mon Tue Wed Thu Fri"):
    freq = pd.offsets.CustomBusinessDay(
        n=n,
        holidays=holidays,
        weekmask=weekmask
    )
    days = pd.date_range(start_date, end_date, freq=freq)
    dates = (pd.date_range(d, d + pd.DateOffset(days=1), freq='H', closed='left') for d in days)
    return next(dates).union_many(dates)

Obviously you can extend it to make it as flexible as you want to suit your needs.

Shame the canonical pandas solution doesn't work as it should.

Griffin
  • 13,184
  • 4
  • 29
  • 43
-1

I have bypassed the issue, I have used CustomBusinessDay instead of CustomBusinessHour

Use=CustomBusinessDay(weekmask=(1,1,1,1,1,1,0))

then

Cal=pd.date_range(dt.date(2020,1,1),dt.date(2021,1,1),freq=Use,closed='left').to_series()

I transformed into Series to be able to add a day because the asfreq() stops as soon as the day is reached, it does not sample from hour to hour until the end of the day.

Cal.loc[Cal.iloc[-1]+pd.Timedelta(days=1)]=Cal.iloc[-1]+pd.Timedelta(days=1)

and finally, I resample per hour (I exclude the last that is the first hour of day after)

Cal=Cal.asfreq('H')[:-1].index
dge
  • 41
  • 4