Python script to transpose certain column's values into separate columns

Question

I have this data frame

customer_id	customer_location	customer_contact_id	customer_contact_location
1	ES	10	DE
1	ES	11	DE
1	ES	12	FR
2	FR	20	GB
3	ES	87	ES
3	ES	88	ES

I need to transpose it in a way so there is one row per customer_id. Like this.

customer_id	customer_location	customer_contact_id1	customer_contact_id2	customer_contact_id3	customer_contact_location1	customer_contact_location2	customer_contact_location3
1	ES	10	11	12	DE	DE	FR
2	FR	20			GB
3	ES	87	88		ES	ES

So to outline what the python script should do.

Take the first data frame as input.
Transpose the contact level data (customer_contact_id and customer_contact_location) so that there is one row per customer_id.
Whatever the maximum amount of customer_contact_id's per any particular customer_id in the entire dataframe should be the number of columns created per each contact level attribute. In this example the maximum number is 3, so therefore there are 3 customer_contact id columns and 3 customer contact location columns.
The script should be dynamic so the user can define the columns by position that should not be transposed and those that should. E.g. If I add a new customer attribute (e.g. customer_postal_code) or a new customer_contact attribute that needs to be transposed (e.g. customer_contact_name), there should be parameters where those columns could be added.

customer level columns

customer_cols = ['customer_id', 'customer_location']

contact level columns

contact_cols = ['customer_contact_id', 'customer_contact_location']

Number of columns to be created for each contact attribute

max_contact_ids = rows.groupby('customer_id').size().max()

Welcome to Stack Overflow, Kristina. I was looking for a question but I didn't found any. Please try to better explain your issue. Are you looking for someone to write code for you? If so, StackOverflow might not be the right place. — Pawel Kam, Mar 01 '23 at 17:21

Chrysophylaxs · Answer 1 · 2023-03-01T18:25:01.357

Here is one way:

import pandas as pd

df = pd.read_clipboard() # Your data here

customer_cols = ['customer_id', 'customer_location']
contact_cols = ['customer_contact_id', 'customer_contact_location']

# First create a new dummy index, then pivot
out = df.assign(
    new_cols=df.groupby("customer_id").cumcount().add(1)
).pivot(
    index=customer_cols,
    columns="new_cols",
    values=contact_cols
)

# Finally, fix the indices to your format:
out = out.set_axis(
    out.columns.map("%s%d".__mod__),
    axis=1
).reset_index()

out:

   customer_id customer_location customer_contact_id1 customer_contact_id2  \
0            1                ES                   10                   11   
1            2                FR                   20                  NaN   
2            3                ES                   87                   88   

  customer_contact_id3 customer_contact_location1 customer_contact_location2  \
0                   12                         DE                         DE   
1                  NaN                         GB                        NaN   
2                  NaN                         ES                         ES   

  customer_contact_location3  
0                         FR  
1                        NaN  
2                        NaN  

[3 rows x 8 columns]

Python script to transpose certain column's values into separate columns

customer level columns

contact level columns

Number of columns to be created for each contact attribute

1 Answers1