1

I have categorical data (A, B, etc.) in which multiple matches can exist within the same field such as A,B. I would like to break my data into additional rows just for the purpose of counting the number of instances of each value.

df = pd.DataFrame({"Values" : ["A", "B", "C", "A,B"]})
df
    Values
0   A
1   B
2   C
3   A,B

Currently:

df["Values"].value_counts()
B       1
A,B     1
A       1
C       1
Name: Values, dtype: int64

My ideal function would work something like this:

df["Values"].value_counts(split = ",")
A    2
B    2
C    1
Name: Values, dtype: int64
Francis Smart
  • 3,875
  • 6
  • 32
  • 58

3 Answers3

2

Use Series.str.split and then explode()

print( df['Values'].str.split(',').explode().value_counts() )

Prints:

A    2
B    2
C    1
Name: Values, dtype: int64

EDIT:

df = pd.DataFrame({"Values" : ["A", "B", "C", "A,B"]})
print( df['Values'].str.split(',').explode().value_counts() )
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
2

Try with stack after split and value_counts

df.Values.str.split(',',expand=True).stack().value_counts()
A    2
B    2
C    1
dtype: int64
BENY
  • 317,841
  • 20
  • 164
  • 234
1

If you don't need to worry about counting duplicates values on the same row Series.str.get_dummies + sum

df['Values'].str.get_dummies(',').sum()

A    2
B    2
C    1
dtype: int64
ALollz
  • 57,915
  • 7
  • 66
  • 89