3

Consider a Pandas dataframe which has a column 'id', and the rows of this column consists of list of strings representing categories. What is an efficient way to obtain the dummy variables?

Example:

Input:

df1 = pd.DataFrame({'id': ['0,1', '24,25', '1,24']})

Output:

df2 = pd.DataFrame({'0':[1, 0, 0],
               '1': [1, 0, 1],
               '24':[0, 1, 1],
               '25':[0, 1, 0]})
martineau
  • 119,623
  • 25
  • 170
  • 301
Shree
  • 73
  • 6

1 Answers1

7

Use the .str accessor version of get_dummies:

df1['id'].str.get_dummies(sep=',')

The resulting output:

   0  1  24  25
0  1  1   0   0
1  0  0   1   1
2  0  1   1   0
root
  • 32,715
  • 6
  • 74
  • 87