0

I have a function which calls df.drop_duplciates(subset = ["Column"], keep = 'last') method multiple times which takes a lot of time. How can I speed up this process? What is the other, faster way to remove duplicated rows from DataFrame by a given column(s) and keep the last one?

Berkos
  • 13
  • 3
  • 1
    Set a boolean flag on each row you want to remove and at the end of your process remove them. You can use `duplicated` instead of `drop_duplicates` to get this. – Corralien May 16 '23 at 13:13
  • 1
    check out : https://stackoverflow.com/questions/54196959/is-there-any-faster-alternative-to-col-drop-duplicates for some suggestions – JonSG May 16 '23 at 14:03

0 Answers0