I'm working on building a bridge table between two tables with a many-to-many relationship. Table A contains employee IDs and job assignments (one employee can have more than one assignment). Table B has the same structure. The aim is to build a reconciliation report to find misalignment of job assignments for each employee between Table A and Table B.
To create the bridge table I created a union of Table A and Table B using the concat function. This creates a table with all the employee IDs and the different job assignments. Below is a table showing all the assignments for one employee:
employee_id job_assignment
001 sales
001 marketing
001 management
001 sales
001 marketing
001 HR
But since this is a bridge table I don't want all these duplicate records. I want the output to look like this:
employee_id job_assignment
001 sales
001 marketing
001 management
001 HR
I then run bridge.drop_duplicates(inplace = True)
to drop all the duplicate records, but it is failing to do so.
The datatype for both of these columns are 'object'
I'm still pretty new to manipulating data and building dataframes in pandas and assistance would be much appreciated.