I have a function which calls df.drop_duplciates(subset = ["Column"], keep = 'last')
method multiple times which takes a lot of time. How can I speed up this process? What is the other, faster way to remove duplicated rows from DataFrame
by a given column(s) and keep the last one?
Asked
Active
Viewed 46 times
0

Berkos
- 13
- 3
-
1Set a boolean flag on each row you want to remove and at the end of your process remove them. You can use `duplicated` instead of `drop_duplicates` to get this. – Corralien May 16 '23 at 13:13
-
1check out : https://stackoverflow.com/questions/54196959/is-there-any-faster-alternative-to-col-drop-duplicates for some suggestions – JonSG May 16 '23 at 14:03