0

The goal is to convert a pandas timestamp object to week of the year according to the following user defined calendars for 2021 and 2022 respectively.

enter image description here

enter image description here

I am using the week property of pandas datetime, which worked for dates in 2021, but they are breaking down for the next year. This is the initial function I wrote.

def week(date: pandas.Timestamp) -> int:
    """Convert the date to a week according to client calendar."""
    orig_week: int = date.week % 53
    return orig_week + 1 if date.dayofweek < 5 else orig_week + 2

I added the modulo 53 since without it sometimes the rest of the logic gives me numbers like 54 or something. But I am not sure about the internal logic of the pandas week property, so cannot really get a grip on how to convert that week to the above user defined calendar, although it sounds like a simple shift. The problems are around the edge cases (end or beginning of year). So any help will be appreciated.

Della
  • 1,264
  • 2
  • 15
  • 32
  • pandas timestamp has `weekday` method, did you try it? – ThePyGuy Oct 26 '21 at 07:56
  • "*user defined calendars for 2021 and 2022*" - so what is the definition? not to mention that re-inventing date/time or calendars is almost never a good idea ;-) – FObersteiner Oct 26 '21 at 07:57
  • @MrFuppes well, customer is the boss. The only _definition_ I have is the figures. It is sort of like a truth table, not an expression or algorithm. – Della Oct 26 '21 at 08:02
  • @ThePyGuy is not it the same property as the ```week```? Also, having the method is not the concern, but how to convert that to the customer defined calendar week. – Della Oct 26 '21 at 08:03
  • Is your `week` function specially for supporting for a particular year, or need to support any year ? – SeaBean Oct 26 '21 at 09:35
  • @SeaBean for now it has to support all years, but I am trying to develop a dirty method which will cover at least five years. – Della Oct 26 '21 at 10:13
  • Will the `week` function be called very frequently ? – SeaBean Oct 26 '21 at 10:25
  • @SeaBean yes, so some sort of brute force look up for every call is not feasible. I was thinking of using brute force to build a dictionary mapping for 5-10 years which should not take long, then I can get O(1) lookup from the dictionary. But thanks for your answer below, it is really not as simple as it was looking to me. – Della Oct 26 '21 at 11:15
  • Della, the logics in my answer is a general approach and can support any year. As it needs to handle in generic way, it may looks a bit complicated. But as you can do it once and use for many years, it's worth a generic logics. Feel free to let me know if you need any further info to understand it more. – SeaBean Oct 26 '21 at 11:30
  • Della, any question regarding using the solution ? – SeaBean Oct 28 '21 at 08:53
  • Thanks, not so far. I am still kinda going through the code, actually you have used some pandas API I was not aware of. So digesting it. But it works. – Della Oct 28 '21 at 23:17
  • Della, you can take a look at [this post](https://stackoverflow.com/questions/32444154/pandas-custom-week) to understand more. I based on it to get an idea for this solution. But it needs to be enhanced to properly set the 1st week (could be week 53 for the week of Jan 1) and also the last week of the year (could be week 1 instead of week 53). That's why we use np.where to adjust for these 2 cases. – SeaBean Oct 29 '21 at 07:01

1 Answers1

1

One solution is to create a custom calendar for the input year. The dates in the year will be converted to a Period representing weeks that start on Saturday and end on Friday.

import pandas as pd

def week(in_date: pd.Timestamp) -> int:
    """Convert the date to a week according to client calendar."""
    # Setup a custom calendar table for the year of input date
    in_date_year = str(in_date.year)
    _df = pd.DataFrame({'Date':pd.date_range(in_date_year+'-1-1', in_date_year+'-12-31')})
    _df['Period'] = _df['Date'].dt.to_period('W-FRI')   # define week period that starts on SAT and ends on FRI 
    _df['Week_Num'] = _df['Period'].dt.week

    # Adjust week number for year start 
    _df['Week_Num'] = np.where(_df['Week_Num'].iloc[0] >= 52, _df['Week_Num'] % 53 + 1, _df['Week_Num'])
    # Adjust week number for year end
    _df.iloc[-7:, _df.columns.get_loc('Week_Num')] = np.where(_df['Week_Num'].iloc[-7:] < 52, 53, _df['Week_Num'].iloc[-7:])

    # Get week number and return
    return _df.loc[_df['Date'] == in_date, 'Week_Num'].iat[0]

The week numbers returned for the Period will be set according to weeks that start on SAT and end on FRI. However, for the year start and year end, the week number may still show week number corresponding to the previous/next year. Hence, we checked and adjusted for this year start/end discrepancy accordingly.

Result:

week(pd.Timestamp('2022-01-01'))
#output
1

week(pd.Timestamp('2022-12-31'))
#output
53

The underlying table is built as follows:

For year 2022:

# print first 10 rows of the year
print(_df.head(10))

        Date                 Period  Week_Num
0 2022-01-01  2022-01-01/2022-01-07         1
1 2022-01-02  2022-01-01/2022-01-07         1
2 2022-01-03  2022-01-01/2022-01-07         1
3 2022-01-04  2022-01-01/2022-01-07         1
4 2022-01-05  2022-01-01/2022-01-07         1
5 2022-01-06  2022-01-01/2022-01-07         1
6 2022-01-07  2022-01-01/2022-01-07         1
7 2022-01-08  2022-01-08/2022-01-14         2
8 2022-01-09  2022-01-08/2022-01-14         2
9 2022-01-10  2022-01-08/2022-01-14         2


# print last 10 rows of the year
print(_df.tail(10))

          Date                 Period  Week_Num
355 2022-12-22  2022-12-17/2022-12-23        51
356 2022-12-23  2022-12-17/2022-12-23        51
357 2022-12-24  2022-12-24/2022-12-30        52
358 2022-12-25  2022-12-24/2022-12-30        52
359 2022-12-26  2022-12-24/2022-12-30        52
360 2022-12-27  2022-12-24/2022-12-30        52
361 2022-12-28  2022-12-24/2022-12-30        52
362 2022-12-29  2022-12-24/2022-12-30        52
363 2022-12-30  2022-12-24/2022-12-30        52
364 2022-12-31  2022-12-31/2023-01-06        53

For year 2025:

# print first 10 rows of the year
print(_df.head(10))

        Date                 Period  Week_Num
0 2025-01-01  2024-12-28/2025-01-03         1
1 2025-01-02  2024-12-28/2025-01-03         1
2 2025-01-03  2024-12-28/2025-01-03         1
3 2025-01-04  2025-01-04/2025-01-10         2
4 2025-01-05  2025-01-04/2025-01-10         2
5 2025-01-06  2025-01-04/2025-01-10         2
6 2025-01-07  2025-01-04/2025-01-10         2
7 2025-01-08  2025-01-04/2025-01-10         2
8 2025-01-09  2025-01-04/2025-01-10         2
9 2025-01-10  2025-01-04/2025-01-10         2

# print last 10 rows of the year
print(_df.tail(10))

          Date                 Period  Week_Num
355 2025-12-22  2025-12-20/2025-12-26        52
356 2025-12-23  2025-12-20/2025-12-26        52
357 2025-12-24  2025-12-20/2025-12-26        52
358 2025-12-25  2025-12-20/2025-12-26        52
359 2025-12-26  2025-12-20/2025-12-26        52
360 2025-12-27  2025-12-27/2026-01-02        53
361 2025-12-28  2025-12-27/2026-01-02        53
362 2025-12-29  2025-12-27/2026-01-02        53
363 2025-12-30  2025-12-27/2026-01-02        53
364 2025-12-31  2025-12-27/2026-01-02        53
SeaBean
  • 22,547
  • 3
  • 13
  • 25