Summing up collections.Counter objects using `groupby` in pandas

Question

I am trying to group the words_count column by both essay_Set and domain1_score and adding the counters in words_count to add the counters results as mentioned here:

>>> c = Counter(a=3, b=1)
>>> d = Counter(a=1, b=2)
>>> c + d                       # add two counters together:  c[x] + d[x]
Counter({'a': 4, 'b': 3})

I grouped them using this command: words_freq_by_set = words_freq_by_set.groupby(by=["essay_set", "domain1_score"]) but do not know how to pass the Counter addition function to apply it on words_count column which is simply +. Here is my dataframe:

score 1 · Accepted Answer · answered Dec 31 '20 at 18:57

1

GroupBy.sum works with Counter objects. However I should mention the process is pairwise, so this may not be very fast. Let's try

words_freq_by_set.groupby(by=["essay_set", "domain1_score"])['words_count'].sum()

df = pd.DataFrame({
    'a': [1, 1, 2], 
    'b': [Counter([1, 2]), Counter([1, 3]), Counter([2, 3])]
})
df

   a             b
0  1  {1: 1, 2: 1}
1  1  {1: 1, 3: 1}
2  2  {2: 1, 3: 1}


df.groupby(by=['a'])['b'].sum()

a
1    {1: 2, 2: 1, 3: 1}
2          {2: 1, 3: 1}
Name: b, dtype: object

answered Dec 31 '20 at 18:57

cs95

379,657
97
704
746

1

Approximately have 13K rows, took ~ `26` seconds. Thank you so much! It worked. – Hazem Alabiad Dec 31 '20 at 19:06
1

@White159 that's piss poor, and also expected. If you're having performance trouble consider exploding your dictionaries. With the right data representation this operation could take a fraction of a second. – cs95 Dec 31 '20 at 20:28

Summing up collections.Counter objects using `groupby` in pandas

1 Answers1