2

I am trying to take a pandas dataframe and perform a pivot like operation on a single column. I want to take multiple rows (grouped by some identification columns) and convert that single column into dummy indicator variables. I know of pd.get_dummies() but I want to condense the multiple rows into a single row.

An example below:

import pandas as pd
import numpy as np

# starting data
d = {'ID': [1,1,1,2,2,3,3,3], 
     'name': ['bob','bob','bob','shelby','shelby','jordan','jordan','jordan'],
     'type': ['type1','type2','type4','type1','type6','type5','type8','type2']}
df: pd.DataFrame = pd.DataFrame(data=d)
print(df.head(9))

   ID    name   type
0   1     bob  type1
1   1     bob  type2
2   1     bob  type4
3   2  shelby  type1
4   2  shelby  type6
5   3  jordan  type5
6   3  jordan  type8
7   3  jordan  type2

I would like the end result to look like:

   ID    name  type1  type2  type4  type5  type6  type8
0   1     bob      1      1      1      0      0      0
1   2  shelby      1      0      0      0      1      0
2   3  jordan      0      1      0      1      0      1
Coldchain9
  • 1,373
  • 11
  • 31

2 Answers2

3

Here you go!

pd.get_dummies(df, columns=['type'], prefix='', prefix_sep='').groupby(['ID','name']).max().reset_index()
  • prefix='' and prefix_sep='' avoid adding extra prefix such as type_type1, type_type2 in columns names.
  • groupby(this_columns) lets you aggregate values from other columns by this_columns.
  • max() on groupby() will provide you a maximum of aggregated values, as an example, for bob, you will have type1 values 1,0,0 in dummy format, but you want it 1 if any of it is 1, so max() works here.
  • reset_index() gives your columns ID and name back which were converted to indices by groupby().
Zeel B Patel
  • 691
  • 5
  • 16
1

You can use the pandas.DataFrame.pivot_table method (documentation here)

df.pivot_table(index=['ID'], columns=['type'], aggfunc='count', fill_value=0)

Outputs

type type1 type2 type4 type5 type6 type8
ID                                      
1        1     1     1     0     0     0
2        1     0     0     0     1     0
3        0     1     0     1     0     1

I wasn't able to get your desired output with a single line method call, you will need to merge both dataframes and keep the columns you wanted.

You will notice that the pivot_table method returns a dataframe where the column ID is the index.

arhr
  • 1,505
  • 8
  • 16