Pandas drop_duplicates only possible after to_csv and read_csv

Asked Mar 26 '20 at 17:23

Active Mar 26 '20 at 17:23

Viewed 53 times

I got two Data Frames which I combine and they definitely have duplicates as shown later:

total_scrobbles = total_scrobbles.append(new_scrobbles)

After that the drop_duplicates Function doesnt do anything. Not a single row is deleted.

total_scrobbles.drop_duplicates(inplace = True)

But if I save the new DataFrame as a CSV, load it and use the same drop function again it is working:

total_scrobbles.to_csv('test.csv', index=False)
total_scrobbles = pd.read_csv('test.csv')
total_scrobbles.drop_duplicates(inplace = True)

Now all duplicates are deleted.

I mean, i found a solution. But can anybody tell me why this error occurs? In my head it doesn't make any sense. Is there a better solution than save and read_csv for nothing?

Thanks a lot.

asked Mar 26 '20 at 17:23

thepic

Have you checked if both DataFrames are exactly the same? – boechat107 Mar 26 '20 at 17:32
2

My guess it has something to do the the data format youre using, but it is impossible to tell without the data. Provide ~10 problematic rows for which the problem reproduces! – LudvigH Mar 26 '20 at 17:39
Yes, some sample data would be good. You could also experiment with the `subset` argument to the `drop_duplicates()` method, to see which columns are causing the problem. – Arne Mar 26 '20 at 20:56

Pandas drop_duplicates only possible after to_csv and read_csv

0 Answers0