0

Suppose I have a certain defined range of dates, e.g. 2022-12-25 to 2023-02-05. I want to find all fully closed periods (specified by the user) that fit into this date range. For example, if the user specifies months and decades, the method should return

  • 2023-01-01 to 2023-01-10, decade 1
  • 2023-01-11 to 2023-01-20, decade 2
  • 2023-01-21 to 2023-01-31, decade 3
  • 2023-01-01 to 2023-01-31, month 1

Another example would be finding all fully closed seasons (DJF, MMA, JJA, SON) and years in 2021-12-31 to 2023-02-28, which should result in

  • 2022-01-01 to 2022-12-31, year 1
  • 2022-03-01 to 2022-04-30, season 1 (MMA, spring)
  • 2022-05-01 to 2022-08-31, season 2 (JJA, summer)
  • 2022-09-01 to 2022-11-30, season 3 (SON, fall)
  • 2022-12-01 to 2023-02-28, season 4 (DJF, winter)

Is there a way to find this in pandas easily without many if/else conditions?

Droid
  • 441
  • 1
  • 3
  • 18
  • Hi. StackOverflow is not a coding service. What have you tried so far? – alec_djinn Feb 20 '23 at 13:58
  • Unfortunately I'm out of ideas because I couldn't find any other similar question and I'm not sure how one could check all `pd.Period` that fit into a `pd.date_range` – Droid Feb 20 '23 at 14:03
  • Check https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.date_range.html and see if something comes to mind.... – alec_djinn Feb 20 '23 at 14:05
  • may go with pd.cut – BENY Feb 20 '23 at 14:29
  • I've actually figured out all cases for monthly, year and season but cannot really figure out how to do the one for the decades as there is no way to define when to start the `10D` frequency as far as i know. – Droid Feb 20 '23 at 14:36

1 Answers1

0

In case someone has the same issue this is the solution I found. It is not super clean but it seems to work.

Basically for the monthly, seasonal and yearly cycle it is relatively easy to do as you can specify the frequency directly when creating the date range, so for example, starting from the seasonal frequency:

season_start = pd.date_range(start='2022-12-25',
              end='2024-02-05',
              freq='QS-MAR')
season_end = (season_start - pd.to_timedelta('1 day'))

season_start[:-1].strftime('%Y-%m-%d'), season_end[1:].strftime('%Y-%m-%d')

which gives

['2023-03-01', '2023-06-01', '2023-09-01']
['2023-05-31', '2023-08-31', '2023-11-30']

You can apply exactly the same code for monthly and yearly frequency by using freq=MS and freq=YS, respectively.

For the decade it's little bit more difficult but this seems to work

daterange = pd.date_range(start='2023-02-05',
              end='2023-04-01',
              freq='D')

daterange = daterange[daterange.day.isin([1, 11, 21])]

daterange_end = (daterange - pd.to_timedelta('1 day'))

daterange[:-1].strftime('%Y-%m-%d'), daterange_end[1:].strftime('%Y-%m-%d')

which gives

['2023-02-11', '2023-02-21', '2023-03-01', '2023-03-11', '2023-03-21']
['2023-02-20', '2023-02-28', '2023-03-10', '2023-03-20', '2023-03-31']
Droid
  • 441
  • 1
  • 3
  • 18