1

Given a series of unknown size inner list:

import pandas as pd
sr = pd.Series([['a', 'b', 'c', 'b'], ['a', 'a', 'd'], ['b']])

[out]:

0    [a, b, c, b]
1       [a, a, d]
2             [b]

The goal is to use values in the inner list to create the columns and populate its value with the count of the items in each row, i.e.

     a    b    c    d
0  1.0  2.0  1.0  NaN
1  2.0  NaN  NaN  1.0
2  NaN  1.0  NaN  NaN

I have tried achieving the the above by iterating through each row and converting them into Counter objects and recreating the dataframe using the list of counter dictionaries:

>>> from collections import Counter
>>> pd.DataFrame([dict(Counter(row)) for row in pd.Series([['a', 'b', 'c', 'b'], ['a', 'a', 'd'], ['b']])])

Is there a simpler way to do this? Perhaps with .pivot() ?

alvas
  • 115,346
  • 109
  • 446
  • 738

2 Answers2

2

Use

In [179]: pd.DataFrame(Counter(x) for x in sr)
Out[179]:
     a    b    c    d
0  1.0  2.0  1.0  NaN
1  2.0  NaN  NaN  1.0
2  NaN  1.0  NaN  NaN

Or

In [182]: sr.apply(lambda x: pd.Series(Counter(x)))
Out[182]:
     a    b    c    d
0  1.0  2.0  1.0  NaN
1  2.0  NaN  NaN  1.0
2  NaN  1.0  NaN  NaN

Or value_counts

In [170]: sr.apply(lambda x: pd.Series(x).value_counts())
Out[170]:
     a    b    c    d
0  1.0  2.0  1.0  NaN
1  2.0  NaN  NaN  1.0
2  NaN  1.0  NaN  NaN

Or

In [174]: pd.DataFrame(pd.Series(x).value_counts() for x in sr)
Out[174]:
     a    b    c    d
0  1.0  2.0  1.0  NaN
1  2.0  NaN  NaN  1.0
2  NaN  1.0  NaN  NaN
Zero
  • 74,117
  • 18
  • 147
  • 154
2

I think if input is list like in previous question:

lol = [['a', 'b', 'c', 'b'], ['a', 'a', 'd'], ['b']]
df = pd.DataFrame(Counter(x) for x in lol)
print (df)
     a    b    c    d
0  1.0  2.0  1.0  NaN
1  2.0  NaN  NaN  1.0
2  NaN  1.0  NaN  NaN

If input is Series:

df = pd.DataFrame(sr.values.tolist()).apply(pd.value_counts, 1)
print (df)
     a    b    c    d
0  1.0  2.0  1.0  NaN
1  2.0  NaN  NaN  1.0
2  NaN  1.0  NaN  NaN
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252