0

So I have a Pandas dataframe with the following structure:

ClientId Product Quantity
01 Apples 2
01 Oranges 3
01 Bananas 1
02 Apples 4
02 Bananas 2

and would like to get this df to look like:

ClientId Product_Apples Quantity_Apples Product_Oranges Quantity_Oranges Product_Bananas Quantity_Bananas
01 1 2 1 3 1 1
02 1 4 0 0 1 2

where the columns starting with Product are binary variables.

What would be the Python code for this transformation?

wlog
  • 25
  • 3

1 Answers1

2

Assuming your dataset is a pandas dataframe, you can try this:

import pandas as pd

df = pd.DataFrame({
    'ClientID': ['01']*3 + ['02']*2,
    'Product': ['Apples', 'Oranges', 'Bananas', 'Apples', 'Bananas'],
    'Quantity': [2, 3, 1, 4, 2]
})

df['product_cnt'] = 1

# solution 1
df.pivot(index='ClientID', columns='Product', values=['Quantity', 'product_cnt']).fillna(0)

# solution 2
df_products = df.pivot(index='ClientID', columns='Product', values='product_cnt').fillna(0).add_prefix('Product_')
df_qty = df.pivot(index='ClientID', columns='Product', values='Quantity').fillna(0).add_prefix('Quantity_')
df_products.merge(df_qty, on='ClientID', how='left')

Juliette B
  • 186
  • 1
  • 5