How to get dropped rows when using drop_duplicates (Pandas DataFrame)?

Question

I use pandas.DataFrame.drop_duplicates() to drop duplicates of rows where all column values are identical, however for data quality analysis, I need to produce a DataFrame with the dropped duplicate rows. How can I identify which are the rows to be dropped? It occurs to me to compare the original DF versus the new one without duplicates and identify the unique indexes missing, but is there a better way to do this?

Example:

import pandas as pd

data =[[1,'A'],[2,'B'],[3,'C'],[1,'A'],[1,'A']]

df = pd.DataFrame(data,columns=['Numbers','Letters'])

df.drop_duplicates(keep='first',inplace=True) # This will drop rows 3 and 4

# Now how to create a dataframe with the duplicate records dropped only?

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.duplicated.html — Chris, Jul 06 '20 at 17:28
Thanks Chris! This is exactly what I was looking for. Cheers! — Code Ninja 2C4U, Jul 06 '20 at 17:50

score 10 · Accepted Answer · answered Jul 06 '20 at 17:30

import pandas as pd

data =[[1,'A'],[2,'B'],[3,'C'],[1,'A'],[1,'A']]

df = pd.DataFrame(data,columns=['Numbers','Letters'])


df.drop_duplicates()

Output

    Numbers Letters
0   1       A
1   2       B
2   3       C

and

df.loc[df.duplicated()]

Output

    Numbers Letters
3   1       A
4   1       A

How to get dropped rows when using drop_duplicates (Pandas DataFrame)?

1 Answers1