Attempting to subtract the values between two different Pandas Series based on corresponding indices

Question

Here are the two series. They were generated after grouping a dataframe based on two column values, patient_id and event_description:

Series 1:

patient_id                                event_description     
A                                             DiagnosisA          2016-09-15
B                                             DiagnosisA          2013-01-01
C                                             DiagnosisA         2016-01-10
D                                             DiagnosisA          2015-01-02

Series 2:

patient_id                                 event_description     
A                                             DiagnosisB          2016-09-30
B                                             DiagnosisB          2013-01-04
D                                             DiagnosisB          2015-01-15

So I essentially Would like to find the difference in time between between each patient. For example for patient A it would return 15 days but for patient C, It could say `NaN.

cs95 · Answer 1 · 2017-10-30T06:10:33.933

3

If the event_description column has no role to play here, I'd suggest just getting rid of it. Make a reset_index call and just subtract the resultants.

r1 = df1.reset_index(level=1, drop=1)
r2 = df2.reset_index(level=1, drop=1)

r = r2.sub(r1)

r
           time
patient_id        
A          15 days
B           3 days
C              NaT
D          13 days

edited Oct 30 '17 at 06:10

answered Oct 30 '17 at 04:35

cs95

379,657
97
704
746

wouldnt the output be zero? – Bharath M Shetty Oct 30 '17 at 05:44
@Bharathshetty If they're datetimes, the result will be a timedelta, unless I'm missing something. – cs95 Oct 30 '17 at 05:45
1

r2 - r2 cancel each other haha :) – Bharath M Shetty Oct 30 '17 at 05:45
@Bharathshetty LOL Thanks buddy. – cs95 Oct 30 '17 at 05:46

piRSquared · Accepted Answer · 2017-10-30T06:19:42.740

Assuming your groupby included both DiagnosisB and DiagnosisA, you could unstack beforehand and subtract

# This should be what you have after groupby
s = pd.concat([s1, s2])

s

patient_id  event_description
A           DiagnosisA          2016-09-15
B           DiagnosisA          2013-01-01
C           DiagnosisA          2016-01-10
D           DiagnosisA          2015-01-02
A           DiagnosisB          2016-09-30
B           DiagnosisB          2013-01-04
D           DiagnosisB          2015-01-15
Name: Date, dtype: datetime64[ns]

Then you can unstack and subtract

s.unstack().pipe(lambda d: d.DiagnosisB - d.DiagnosisA)

patient_id
A   15 days
B    3 days
C       NaT
D   13 days
dtype: timedelta64[ns]

Not Recommended
But fun with unstack and squeeze

s1.unstack().squeeze().rsub(s2.unstack().squeeze())

patient_id
A   15 days
B    3 days
C       NaT
D   13 days
dtype: timedelta64[ns]

Also Not Recommended
But fun with xs

s1.xs('DiagnosisA', level=1).rsub(s2.xs('DiagnosisB', level=1))

patient_id
A   15 days
B    3 days
C       NaT
D   13 days
Name: Date, dtype: timedelta64[ns]

Final Non-Recommendation
Had to show pd.IndexSlice
Note: Indices need to be fully lexsorted for this to work.

s1.loc[pd.IndexSlice[:, 'DiagnosisA']].rsub(s2.loc[pd.IndexSlice[:, 'DiagnosisB']])

patient_id
A   15 days
B    3 days
C       NaT
D   13 days
Name: Date, dtype: timedelta64[ns]

Still not that much familiar with pipes should learn more abt it, its cool. It feels like magic — Bharath M Shetty, Oct 30 '17 at 06:18

score 2 · Answer 3 · answered Oct 30 '17 at 05:54

You can use reset_index, we can also use droplevel i.e

df2.index = df2.index.droplevel(1)
df1.index = df1.index.droplevel(1)

df2 - df1

Output :

           time
patient_id        
A          15 days
B           3 days
C              NaT
D          13 days

Attempting to subtract the values between two different Pandas Series based on corresponding indices

3 Answers3