problem with pandas drop_duplicates removing empty values

Question

Im using drop_duplicates to remove duplicates from my dataframe based on a column, the problem is this column is empty for some entries and those ended being removed to is there a way to make the function ignore the empty value. here is an example

    Title                  summary                  
0   TITLE A                summaryA       
1   TITLE A                summaryB  
2                          summaryC       
3                          summaryD

using this

data.drop_duplicates(subset ="TITLE", 
                     keep = 'first', inplace = True)

I get a result like this:

    Title                  summary                  
0   TITLE A                summaryA        
2                          summaryC

but since last two rows are not duplicates i want to keep them. is there a ways for drop_duplicates to ignore empty values?

score 0 · Answer 1 · answered Apr 30 '20 at 08:03

Fill missing values with the index number? Maybe not the prettiest way but it works

df = pd.DataFrame(
    {'Title':['TITLE A', 'TITLE A', None, None], 'summary':['summaryA', 'summaryB', 
    'summaryC', 'summaryD']}
    )

df['_id'] = df.index
df['_id'] = df['_id'].apply(str)
df['Title2'] = df['Title'].fillna(df['_id'])  

df.drop_duplicates(subset ="Title2", keep = 'first')

score 0 · Answer 2 · edited May 30 '23 at 14:55

0

You can do this

data.drop_duplicates(subset ="TITLE", 
                     keep = 'last', inplace = True)

edited May 30 '23 at 14:55

Jorge Luis

813
6
21

answered May 30 '23 at 08:55

Nalandeep Govande

1
1

problem with pandas drop_duplicates removing empty values

2 Answers2