I am running into an issue of df.duplicated() erroneously returning true. When I reset the index (df.reset_index()) df.duplicates() returns the correct result.
This issue was raised in 2013 however, the cause was not identified, just a work-around. I am experiencing the problem now after reading data in from an SQL database. I would greatly appreciate if someone has a solution, as i don't want to have to resort to resetting the index of a df everytime I need to run the .duplicated() method.
I get the following when I display the 'duplicates' using df[df.duplicated()]:
name type code
John Doe A 6532
Jane Doe A 1124
Rudolph Doe B 3412
None of these are duplicated. After I perform df.reset_index() I get completely different (and correct) results.
I'm quite confused and have scoured the Internet for a solution. I appreciate any help one could provide.
I'm using the latest Pandas (0.19.1) release. However, I tried this with 0.18 and had the same problem.