How to drop_duplicate using different condition per group?

Question

I have dataFrame and I need to drop duplicates per group ('col1') based on a minimum value in another column 'abs(col1 - col2)', but I need to change this condition for the last group by taking the max value in 'abs(col1 - col2)' that corresponding to the last group in 'col1' where I sorted the 'col1' with ascending order. (to behave as a loop)

Update 1 :

I need to assign the last group dynamically.

for example, If I have a data frame as

creating DataFrame

df = pd.DataFrame( {'col0':['A','A','A','A','A','A','A','A','A','A','A','A','B','B','B','B','B','B','B','B','B','B','B','B'],'col1':[1,1,1,2,2,2,3,3,3,4,4,4,2,2,2,3,3,3,4,4,4,5,5,5], 'col2':[2,3,4,1,3,4,1,2,4,1,2,3,3,4,5,2,4,5,2,3,5,2,3,4]})

compute Diff column (this column will be used as a condition)

df['abs(col1 - col2)']=abs(df['col1']-df['col2'])

The original Df as follow :

The desired Df should looks like:

my trial:

df.sort_values(by=['col0','col1','abs(col1 - col2)','col2'],ascending=[True,True,True,False]).drop_duplicates(['col0','col1'])
the resulting as follow:

onepan · Accepted Answer · 2019-03-28T15:52:43.253

Updated:

If I understand correctly, you want each a different sort order each time you reach a maximum on col1.

Extract the groups where sorting is different:

df.groupby(['col0'], as_index=False)['col1'].max()

Deduplicate df as you did
Deduplicate just the groups found in step 1 with the correct sorting. You can get these groups by merging with the original df:

pd.merge(df, col1_max_groups)

Update the deduplicated DataFrame with the new values

Full example:

col1_max_groups = df.groupby(['col0'], as_index=False)['col1'].max()
deduped = df.sort_values(['col0', 'col1', 'abs(col1 - col2)', 'col2'], 
                         ascending=[True, True, True, False]) \
    .drop_duplicates(['col0', 'col1']) \
    .set_index(['col0', 'col1'])
update = pd.merge(df, col1_max_groups) \
    .sort_values(['col0', 'col1', 'abs(col1 - col2)', 'col2'], 
                 ascending=[True, True, False, False]) \
    .drop_duplicates(['col0', 'col1'])
deduped.update(update.set_index(['col0', 'col1']))
deduped.reset_index()

# returns
# col0  col1  col2  abs(col1 - col2)
#    A     1     2                 1
#    A     2     3                 1
#    A     3     4                 1
#    A     4     1                 3
#    B     2     3                 1
#    B     3     4                 1
#    B     4     5                 1
#    B     5     2                 3

score 1 · Answer 2 · answered Mar 28 '19 at 03:59

If this particular result is what you want to have you can split the dataframe and use two different rules, and then concat them again. As example:

import pandas as pd

df = pd.DataFrame( {'col1':[1,1,1,2,2,2,3,3,3,4,4,4], 'col2':[2,3,4,1,3,4,1,2,4,1,2,3]})

df['abs(col1 - col2)']=abs(df['col1']-df['col2'])

df = df.sort_values(by=['col1','abs(col1 - col2)','col2'],ascending=[True,True,False]).drop_duplicates('col1')
df1 = df.loc[df['col1'] != 4]

df2 = df.loc[df['col1'] == 4]
df2 = df2.sort_values(by=['col1','abs(col1 - col2)','col2'],ascending=[True,True,False])
df2Last = df2.tail(1)

df = pd.concat([df1, df2Last])

result:

col1  col2  abs(col1 - col2)
 1     2                 1
 2     3                 1
 3     4                 1
 4     1                 3

Thanks for your answer. I don't need to assign ```4``` directly. If I have another column considered as a group column as illustrated in update 1. then I need to obtain the desired ```DF``` dynamically. — Sidhom, Mar 28 '19 at 04:53

How to drop_duplicate using different condition per group?

Update 1 :

compute Diff column (this column will be used as a condition)

2 Answers2

Updated: