Removing duplictes appearing in two or more columns Python

Question

df = pd.read_csv('ABCD.csv', index_col=['A'])
df=df.drop_duplicates(['A'],['B'])

KeyError: Index(['Sample_ID'], dtype='object')

Here I have found out that it impossible to removed the index itself so I removed it from the top:

df = pd.read_csv('ABCD.csv')
df=df.drop_duplicates(['A'],['B'],keep = 'first')

TypeError: drop_duplicates() got multiple values for argument 'keep'

When I print df(type) it posts "DataFrame" , what could be the problem?

remove `]` and you can use `inplace=True` – Mar 31 '20 at 08:09 — , Mar 31 '20 at 08:09

score 1 · Accepted Answer · answered Mar 31 '20 at 08:01

1

I thought that would be

df=df.drop_duplicates(['A', 'B'],keep = 'first')

instead of:

df=df.drop_duplicates(['A'],['B'],keep = 'first')

The subset must be a list of columns, not separate to multiple arguments: subsetcolumn label or sequence of labels, optional doc

PS: You should use df.drop_duplicates(['A', 'B'], keep='first', inplace=True), you dont need to assign back to df when adding inplace

answered Mar 31 '20 at 08:01

Binh

Thank you, that was embarrassing, I used the" inplace Ttrue" now instead either. – TheUndecided Mar 31 '20 at 08:10

1 Answers1