3

I have a dictionary where each key is a row index and each value is a list of dummy values. For example:

my_dict = {'row1': ['a', 'b'], 'row2': ['a'], 'row3': ['b', 'c']}

Can I create a dataframe of dummies with the above in an efficient manner?

>>> df
      a      b      c
row1  True   True   False
row2  True   False  False
row3  False  True   True
Joe B
  • 912
  • 2
  • 15
  • 36

2 Answers2

5

You can use pd.get_dummies:

u = pd.DataFrame.from_dict(my_dict, orient='index')
pd.get_dummies(u, prefix='', prefix_sep='').max(level=0, axis=1).astype(bool)

          a      b      c
row1   True   True  False
row2   True  False  False
row3  False   True   True

You can also use stack and str.get_dummies which is succinct, but this'll be slightly slower.

u.stack().str.get_dummies().max(level=0).astype(bool)

          a      b      c
row1   True   True  False
row2   True  False  False
row3  False   True   True
cs95
  • 379,657
  • 97
  • 704
  • 746
  • On my dataset I got 7.15 ms average on the first solution and 11.7 ms on the second. would you be able to explain how these two construct the final dataframe? – Joe B Mar 26 '19 at 17:14
  • 1
    @JoeB pd.get_dummies is capable of generating one hot encodings for multiple columns. In the second case, str.get_dummies works on a single column, so we stack the data beforehand. – cs95 Mar 26 '19 at 17:29
4

crosstab with constructor

s=pd.DataFrame(list(my_dict.values()),index=my_dict.keys()).stack()

pd.crosstab(s.index.get_level_values(0),s).astype(bool)
Out[131]: 
col_0      a      b      c
row_0                     
row1    True   True  False
row2    True  False  False
row3   False   True   True
BENY
  • 317,841
  • 20
  • 164
  • 234