I used pandas to get a list of all Email duplicates, but not all email duplicates are in fact duplicates of a contact, because the company may be small, so that all employees have the same email-address for example.
FirstName | LastName | Phone | Mobile | Company | |
---|---|---|---|---|---|
a@company-a.com | John | Doe | 12342 | 65464 | Company_a |
a@company-a.com | John | Doe | 43214 | 45645 | Comp_ny A |
a@company-a.com | Adam | Smith | 34223 | 46456 | Company A |
b@company-b.com | Bill | Gates | 23423 | 63453 | Company B |
b@company-b.com | Bill | Gates | 32421 | 43244 | Comp B |
b@company-b.com | Elon | Musk | 42342 | 34234 | Company B |
That's why I came up with the following condition to filter my Email duplicate list further down:
I want to extract all the cases where the Email, FirstName and LastName are equal in a dataframe because that almost certainly would mean that this is a real duplicate. The extracted dataframe should look like this in the end:
FirstName | LastName | Phone | Mobile | Company | |
---|---|---|---|---|---|
a@company-a.com | John | Doe | 12342 | 65464 | Company_a |
a@company-a.com | John | Doe | 43214 | 45645 | Comp_ny A |
b@company-b.com | Bill | Gates | 23423 | 63453 | Company B |
b@company-b.com | Bill | Gates | 32421 | 43244 | Comp B |
How can I get there? Is it possible to check for multiple equal conditions?
I would appreciate any feedback regarding the best practices.
Thank you!