1

I have a dataframe that looks as such

df = {'CAL_YEAR':[2021,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2023,2023]
'CAL_MONTH' :[12,1,2,3,4,5,6,7,8,9,10,11,12,1,2]}

I want to calculate a months elapsed columns which should look like this

df = {'CUM_MONTH':[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14]}

how can I do this? my starting month would be 12/2021 or 12/31/2021 (do not care about dates here I only care about the months elapsed). This is economic scenario data but the format of the source data is not in the way we need it.

BigBen
  • 46,229
  • 7
  • 24
  • 40
N27
  • 31
  • 5

2 Answers2

1

IIUC:

multiplier = {'CAL_YEAR': 12, 'CAL_MONTH': 1}
df.assign(
    CUM_MONTH=df[multiplier].diff().mul(multiplier).sum(axis=1).cumsum()
)

    CAL_YEAR  CAL_MONTH  CUM_MONTH
0       2021         12        0.0
1       2022          1        1.0
2       2022          2        2.0
3       2022          3        3.0
4       2022          4        4.0
5       2022          5        5.0
6       2022          6        6.0
7       2022          7        7.0
8       2022          8        8.0
9       2022          9        9.0
10      2022         10       10.0
11      2022         11       11.0
12      2022         12       12.0
13      2023          1       13.0
14      2023          2       14.0
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • Thanks. I was able to figure out my own solution. I basically did that but alot less pythonically. Took me like 5 lines of code: – N27 Apr 06 '22 at 16:03
0

I basically did the above method but in numerous steps. Did not use diff() , sum() and cumsum() functions.

start_year = int(data["VALUATION_DATE"][0][-4:])
    data = data.astype({"CAL_YEAR": "int","CAL_MONTH": "int"})
    data["CAL_YEAR_ELAPSED"] = data["CAL_YEAR"]  -  (start_year+1)
    data["CumMonths"] = data["CAL_MONTH"] + 12 * data["CAL_YEAR_ELAPSED"] +1
N27
  • 31
  • 5