I adapted this approach to your case.
Basically you have to deal with NaT
values. I've chosen to treat them as 0
.
And round the month to a integer, if you desire so.
As in your sample, there is a duplicated row: "1", "it", "2019-03-01"
This row results in 7, 1, it, 2019-03-01, 0 days, 0
(because it was considered sorted unique rows as input)
For this case, it seems to work, though I 've not tested in another ones.
import pandas as pd
df = pd.DataFrame(columns=["id", "country", "date"]
, data=[
["1", "ar", "2019-01-01"],
["1", "ar", "2019-02-01"],
["1", "ar", "2019-03-01"],
["1", "it", "2019-01-01"],
["1", "it", "2019-02-01"],
["1", "it", "2019-03-01"],
["1", "it", "2019-04-01"],
["1", "it", "2019-03-01"],
["2", "ar", "2019-01-01"],
["2", "ar", "2019-02-01"],
["2", "ar", "2019-03-01"],
["2", "it", "2019-01-01"],
["2", "it", "2019-02-01"],
["3", "it", "2019-03-01"],
["3", "it", "2019-04-01"],
["4", "it", "2019-05-01"]
])
df["date"] = pd.to_datetime(df["date"])
df['daysdiff'] = df.sort_values('date').groupby(['id','country'])['date'].diff()
df['monthsdiff'] = (
df
.sort_values('date')
.groupby(['id','country'])['date']
.diff()
# 365.25 [days/year] / (12 [months/year]) = 30.4375 [days/month]
.div(pd.Timedelta(days=365.25/12), fill_value="0")
.round()
.astype(int)
)
print(df)
# id country date daysdiff monthsdiff
# 0 1 ar 2019-01-01 NaT 0
# 1 1 ar 2019-02-01 31 days 1
# 2 1 ar 2019-03-01 28 days 1
# 3 1 it 2019-01-01 NaT 0
# 4 1 it 2019-02-01 31 days 1
# 5 1 it 2019-03-01 28 days 1
# 6 1 it 2019-04-01 31 days 1
# 7 1 it 2019-03-01 0 days 0
# 8 2 ar 2019-01-01 NaT 0
# 9 2 ar 2019-02-01 31 days 1
# 10 2 ar 2019-03-01 28 days 1
# 11 2 it 2019-01-01 NaT 0
# 12 2 it 2019-02-01 31 days 1
# 13 3 it 2019-03-01 NaT 0
# 14 3 it 2019-04-01 31 days 1
# 15 4 it 2019-05-01 NaT 0