3

Let's say we have a dataframe in this format:

id  properties
0   {"cat1":["p1","p2","p4"],"cat2":["p5", "p6"]}
1   {"cat1":["p3"],"cat2":["p7"]}

How can we convert it to this format?

id  p1    p2    p3    p4    p5    p6    p7
0   True  True  False True  True  True  False
1   False False True  False False False True

Keep in mind that the type of values in each cell is string. It has just two categories: cat1 and cat2

MTT
  • 5,113
  • 7
  • 35
  • 61

4 Answers4

3

Using itertools.chain.from_iterable and str.get_dummies

from itertools import chain

df.properties.apply(lambda s: ','.join(chain.from_iterable(s.values())))\
             .str.get_dummies(sep=',')\
             .astype(bool)

      p1     p2     p3     p4     p5     p6     p7
0   True   True  False   True   True   True  False
1  False  False   True  False  False  False   True
rafaelc
  • 57,686
  • 15
  • 58
  • 82
2

Using stack + explode + get_dummies

u = pd.DataFrame(df['properties'].tolist())

u.stack().explode().add(',').sum(level=0).str.get_dummies(',')

   p1  p2  p3  p4  p5  p6  p7
0   1   1   0   1   1   1   0
1   0   0   1   0   0   0   1
user3483203
  • 50,081
  • 9
  • 65
  • 94
2

I will use get_dummies

df.properties.map(lambda x : ','.join(sum(x.values(),[]))).str.get_dummies(',').astype(bool)
      p1     p2     p3     p4     p5     p6     p7
0   True   True  False   True   True   True  False
1  False  False   True  False  False  False   True
BENY
  • 317,841
  • 20
  • 164
  • 234
1

Set_index to id. As you said each cell is a string, so you need to convert df.properties from string of dict to dict by using ast.literal_eval. Next, and using str method to get cat1 and cat2 and combine their lists and explode them to rows and assign result to s. Finally, call pd.get_dummies on s with option dtype=bool and call max on level=0

import ast

df1 = df.set_index('id')
df1.properties = df1.properties.map(ast.literal_eval)
s = (df1.properties.str['cat1'] + df1.properties.str['cat2']).explode()
pd.get_dummies(s, dtype=bool).max(level=0)

Out[1035]:
       p1     p2     p3     p4     p5     p6     p7
id
0   True   True   False  True   True   True   False
1   False  False  True   False  False  False  True
Andy L.
  • 24,909
  • 4
  • 17
  • 29