Create Dummy Columns for values in Single Pandas Column and Group into single row

Question

I am trying to take a pandas dataframe and perform a pivot like operation on a single column. I want to take multiple rows (grouped by some identification columns) and convert that single column into dummy indicator variables. I know of pd.get_dummies() but I want to condense the multiple rows into a single row.

An example below:

import pandas as pd
import numpy as np

# starting data
d = {'ID': [1,1,1,2,2,3,3,3], 
     'name': ['bob','bob','bob','shelby','shelby','jordan','jordan','jordan'],
     'type': ['type1','type2','type4','type1','type6','type5','type8','type2']}
df: pd.DataFrame = pd.DataFrame(data=d)
print(df.head(9))

   ID    name   type
0   1     bob  type1
1   1     bob  type2
2   1     bob  type4
3   2  shelby  type1
4   2  shelby  type6
5   3  jordan  type5
6   3  jordan  type8
7   3  jordan  type2

I would like the end result to look like:

   ID    name  type1  type2  type4  type5  type6  type8
0   1     bob      1      1      1      0      0      0
1   2  shelby      1      0      0      0      1      0
2   3  jordan      0      1      0      1      0      1

Zeel B Patel · Accepted Answer · 2021-02-12T17:33:34.587

3

Here you go!

pd.get_dummies(df, columns=['type'], prefix='', prefix_sep='').groupby(['ID','name']).max().reset_index()

prefix='' and prefix_sep='' avoid adding extra prefix such as type_type1, type_type2 in columns names.
groupby(this_columns) lets you aggregate values from other columns by this_columns.
max() on groupby() will provide you a maximum of aggregated values, as an example, for bob, you will have type1 values 1,0,0 in dummy format, but you want it 1 if any of it is 1, so max() works here.
reset_index() gives your columns ID and name back which were converted to indices by groupby().

edited Feb 12 '21 at 17:33

answered Feb 12 '21 at 17:14

Zeel B Patel

691
5
16

1

Thank you, can you explain how ```.max()``` works in this context? Maybe step through exactly what the code is doing? – Coldchain9 Feb 12 '21 at 17:17
Updated it. Let me know if it is still unclear or maybe you wanna try to breakdown the code and explore :) – Zeel B Patel Feb 12 '21 at 17:27
Kindly help me with an upvote and accepted answer if my half an hour investment helped you :) – Zeel B Patel Feb 12 '21 at 17:36
1

absolutely. Appreciate the help. I believe this solution is accomplishing what I need – Coldchain9 Feb 12 '21 at 17:42
Happy to help :) – Zeel B Patel Feb 12 '21 at 17:43

score 1 · Answer 2 · answered Feb 12 '21 at 16:58

You can use the pandas.DataFrame.pivot_table method (documentation here)

df.pivot_table(index=['ID'], columns=['type'], aggfunc='count', fill_value=0)

Outputs

type type1 type2 type4 type5 type6 type8
ID                                      
1        1     1     1     0     0     0
2        1     0     0     0     1     0
3        0     1     0     1     0     1

I wasn't able to get your desired output with a single line method call, you will need to merge both dataframes and keep the columns you wanted.

You will notice that the pivot_table method returns a dataframe where the column ID is the index.

Create Dummy Columns for values in Single Pandas Column and Group into single row

2 Answers2