I have this data frame
customer_id | customer_location | customer_contact_id | customer_contact_location |
---|---|---|---|
1 | ES | 10 | DE |
1 | ES | 11 | DE |
1 | ES | 12 | FR |
2 | FR | 20 | GB |
3 | ES | 87 | ES |
3 | ES | 88 | ES |
I need to transpose it in a way so there is one row per customer_id. Like this.
customer_id | customer_location | customer_contact_id1 | customer_contact_id2 | customer_contact_id3 | customer_contact_location1 | customer_contact_location2 | customer_contact_location3 |
---|---|---|---|---|---|---|---|
1 | ES | 10 | 11 | 12 | DE | DE | FR |
2 | FR | 20 | GB | ||||
3 | ES | 87 | 88 | ES | ES |
So to outline what the python script should do.
- Take the first data frame as input.
- Transpose the contact level data (customer_contact_id and customer_contact_location) so that there is one row per customer_id.
- Whatever the maximum amount of customer_contact_id's per any particular customer_id in the entire dataframe should be the number of columns created per each contact level attribute. In this example the maximum number is 3, so therefore there are 3 customer_contact id columns and 3 customer contact location columns.
- The script should be dynamic so the user can define the columns by position that should not be transposed and those that should. E.g. If I add a new customer attribute (e.g. customer_postal_code) or a new customer_contact attribute that needs to be transposed (e.g. customer_contact_name), there should be parameters where those columns could be added.
customer level columns
customer_cols = ['customer_id', 'customer_location']
contact level columns
contact_cols = ['customer_contact_id', 'customer_contact_location']
Number of columns to be created for each contact attribute
max_contact_ids = rows.groupby('customer_id').size().max()