Pandas value_counts() with multiple matches in same row

Question

I have categorical data (A, B, etc.) in which multiple matches can exist within the same field such as A,B. I would like to break my data into additional rows just for the purpose of counting the number of instances of each value.

df = pd.DataFrame({"Values" : ["A", "B", "C", "A,B"]})
df
    Values
0   A
1   B
2   C
3   A,B

Currently:

df["Values"].value_counts()
B       1
A,B     1
A       1
C       1
Name: Values, dtype: int64

My ideal function would work something like this:

df["Values"].value_counts(split = ",")
A    2
B    2
C    1
Name: Values, dtype: int64

Andrej Kesely · Accepted Answer · 2020-09-21T20:55:33.827

2

Use Series.str.split and then explode()

print( df['Values'].str.split(',').explode().value_counts() )

Prints:

A    2
B    2
C    1
Name: Values, dtype: int64

EDIT:

df = pd.DataFrame({"Values" : ["A", "B", "C", "A,B"]})
print( df['Values'].str.split(',').explode().value_counts() )

edited Sep 21 '20 at 20:55

answered Sep 21 '20 at 20:52

Andrej Kesely

168,389
15
48
91

AttributeError: 'Series' object has no attribute 'explode'. – Francis Smart Sep 21 '20 at 20:54
@FrancisSmart I posted the full code in EDIT. Still got the error? What version of pandas are you using? – Andrej Kesely Sep 21 '20 at 20:56
Still seem to be getting the same error version 0.24.2. I will update. – Francis Smart Sep 21 '20 at 20:57
1

@FrancisSmart Yes, `explode()` is from version 0.25 - You're using ancient version of pandas, can you upgrade? – Andrej Kesely Sep 21 '20 at 20:58

score 2 · Answer 2 · answered Sep 21 '20 at 21:22

2

Try with stack after split and value_counts

df.Values.str.split(',',expand=True).stack().value_counts()
A    2
B    2
C    1
dtype: int64

answered Sep 21 '20 at 21:22

BENY

317,841
20
164
234

score 1 · Answer 3 · answered Sep 21 '20 at 21:54

1

If you don't need to worry about counting duplicates values on the same row Series.str.get_dummies + sum

df['Values'].str.get_dummies(',').sum()

A    2
B    2
C    1
dtype: int64

answered Sep 21 '20 at 21:54

ALollz

57,915
7
66
89

Pandas value_counts() with multiple matches in same row

3 Answers3