0

I have this data frame

customer_id customer_location customer_contact_id customer_contact_location
1 ES 10 DE
1 ES 11 DE
1 ES 12 FR
2 FR 20 GB
3 ES 87 ES
3 ES 88 ES

I need to transpose it in a way so there is one row per customer_id. Like this.

customer_id customer_location customer_contact_id1 customer_contact_id2 customer_contact_id3 customer_contact_location1 customer_contact_location2 customer_contact_location3
1 ES 10 11 12 DE DE FR
2 FR 20 GB
3 ES 87 88 ES ES

So to outline what the python script should do.

  1. Take the first data frame as input.
  2. Transpose the contact level data (customer_contact_id and customer_contact_location) so that there is one row per customer_id.
  3. Whatever the maximum amount of customer_contact_id's per any particular customer_id in the entire dataframe should be the number of columns created per each contact level attribute. In this example the maximum number is 3, so therefore there are 3 customer_contact id columns and 3 customer contact location columns.
  4. The script should be dynamic so the user can define the columns by position that should not be transposed and those that should. E.g. If I add a new customer attribute (e.g. customer_postal_code) or a new customer_contact attribute that needs to be transposed (e.g. customer_contact_name), there should be parameters where those columns could be added.

customer level columns

customer_cols = ['customer_id', 'customer_location']

contact level columns

contact_cols = ['customer_contact_id', 'customer_contact_location']

Number of columns to be created for each contact attribute

max_contact_ids = rows.groupby('customer_id').size().max()

  • Welcome to Stack Overflow, Kristina. I was looking for a question but I didn't found any. Please try to better explain your issue. Are you looking for someone to write code for you? If so, StackOverflow might not be the right place. – Pawel Kam Mar 01 '23 at 17:21

1 Answers1

0

Here is one way:

import pandas as pd

df = pd.read_clipboard() # Your data here

customer_cols = ['customer_id', 'customer_location']
contact_cols = ['customer_contact_id', 'customer_contact_location']

# First create a new dummy index, then pivot
out = df.assign(
    new_cols=df.groupby("customer_id").cumcount().add(1)
).pivot(
    index=customer_cols,
    columns="new_cols",
    values=contact_cols
)

# Finally, fix the indices to your format:
out = out.set_axis(
    out.columns.map("%s%d".__mod__),
    axis=1
).reset_index()

out:

   customer_id customer_location customer_contact_id1 customer_contact_id2  \
0            1                ES                   10                   11   
1            2                FR                   20                  NaN   
2            3                ES                   87                   88   

  customer_contact_id3 customer_contact_location1 customer_contact_location2  \
0                   12                         DE                         DE   
1                  NaN                         GB                        NaN   
2                  NaN                         ES                         ES   

  customer_contact_location3  
0                         FR  
1                        NaN  
2                        NaN  

[3 rows x 8 columns]
Chrysophylaxs
  • 5,818
  • 3
  • 10
  • 21