How can I drop duplicates in pandas without dropping NaN values

Question

I have a dataframe which I query and I want to get only unique values out of a certain column.
I tried to do that executing this code:

    database = pd.read_csv(db_file, sep='\t')
    query = database.loc[database[db_specifications[0]].isin(elements)].drop_duplicates(subset=db_specification[1])

db_specification is just a list containing two columns that I query.
Some of the values are NaN and I don't want to consider them duplicates of each other, how can I achieve that?

Statistic Dean · Accepted Answer · 2020-08-13T14:12:12.673

1

You can start by selecting all NaN and then drop duplicate on the rest of the dataframe.

mask = data.isna().any()
data = pd.concat([data[mask], data[~mask]])

edited Aug 13 '20 at 14:12

answered Aug 13 '20 at 11:26

Statistic Dean

4,861
7
22
46

Got an error ```AttributeError: 'DataFrame' object has no attribute 'isNaN'``` – Eliran Turgeman Aug 13 '20 at 12:12

How can I drop duplicates in pandas without dropping NaN values

1 Answers1