How to speed up `DataFrame.drop_duplicates()`?

Asked May 16 '23 at 13:06

Active May 16 '23 at 13:06

Viewed 46 times

I have a function which calls df.drop_duplciates(subset = ["Column"], keep = 'last') method multiple times which takes a lot of time. How can I speed up this process? What is the other, faster way to remove duplicated rows from DataFrame by a given column(s) and keep the last one?

asked May 16 '23 at 13:06

Berkos

1

Set a boolean flag on each row you want to remove and at the end of your process remove them. You can use `duplicated` instead of `drop_duplicates` to get this. – Corralien May 16 '23 at 13:13
1

check out : https://stackoverflow.com/questions/54196959/is-there-any-faster-alternative-to-col-drop-duplicates for some suggestions – JonSG May 16 '23 at 14:03

How to speed up `DataFrame.drop_duplicates()`?

0 Answers0