Using Counter() function in python

Question

I have an excel file with many rows and columns. I want to do the following. First, I want to filter the rows based on a text match. Second, I want to choose a particular column and generate word frequency for ALL THE WORDS in that column. Third, I want to graph the word and frequency.

I have figured out the first part. My question is how to apply Counter() on a dataframe. If I just use Counter(df), it returns an error. So, I used the following code to convert each row into a list and then applied Counter. When I do this, I get the word frequency for each row separately (if I use counter within the for loop, else I get the word frequency for just one row). However, I want a word count for all the rows put together. Appreciate any inputs. Thanks! The following is an example data.

product      review
a            Great Product
a            Delivery was fast 
a            Product received in good condition
a            Fast delivery but useless product
b            Dont recommend
b            I love it
b            Please dont buy
b            Second purchase

My desired output is like this: for product a - (product,3),(delivery,2)(fast,2) etc..my current output is like (great,1), (product,1) for the first row.

This is the code I used.

strdata = column.values.tolist() 
tokens = [tokenizer.tokenize(str(i)) for i in strdata] 

cleaned_list = []
for m in tokens:
    stopped = [i for i in m if str(i).lower() not in stop_words] 
    stemmed = [stemmer.stem(i) for i in stopped] 
    cleaned_list.append(stopped) #append stemmed words to list
    count = Counter(stemmed)
    print(count.most_common(10))

not sure how to post a table here. This is exactly my question but the solution does not seem to work. https://stackoverflow.com/questions/46786211/counting-the-frequency-of-words-in-a-pandas-data-frame — Ishwarya Balasubramanian, Apr 07 '21 at 08:03
You can edit your question and put table in the code fences. — Ynjxsjmh, Apr 07 '21 at 08:04

score 1 · Accepted Answer · answered Apr 07 '21 at 09:22

Firstly, using groupby concatenate strings from same group.

Secondly, apply Counter() on joined strings.

joined = df.groupby('product', as_index=False).agg({'review' : ' '.join})
joined['count'] = joined.apply(lambda x: collections.Counter(x['review'].split(' ')), axis=1)

# print(joined)

  product                                             review                                              count
0       a  Great Product Delivery was fast Product receiv...  {'Great': 1, 'Product': 2, 'Delivery': 1, 'was...
1       b  Dont recommend I love it Please dont buy Secon...  {'Dont': 1, 'recommend': 1, 'I': 1, 'love': 1,...

linog · Answer 2 · 2021-04-08T16:28:36.440

You can use the following function. The idea is

Group your data by byvar. Combine every words in yvar as a list.
Apply Counter and, if you want, select the most common
Explode to get a long formatted dataframe (easier to analyze afterwards)
just keep the relevent columns (word and count in a new dataframe) :

from collections import Counter
import pandas as pd

def count_words_by(data, yvar, byvar):
    cw = pd.DataFrame({'counter' : data
        .groupby(byvar)
        .apply(lambda s: ' '.join(s[yvar]).split())
        .apply(lambda s: Counter(s))
#        .apply(lambda s: s.most_common(10)) #uncomment this line if you want the top 10 words
        .explode()}
    )
    cw[['word','count']] = pd.DataFrame(cw['counter'].tolist(), index=cw.index)
    cw_red = cw[['word','count']].reset_index()

    return cw_red

count_words_by(data = df, yvar = "review", byvar = "product")

where I assume you start from there:

product      review
a            Great Product
a            Delivery was fast 
a            Product received in good condition
a            Fast delivery but useless product
b            Dont recommend
b            I love it
b            Please dont buy
b            Second purchase

Thanks! I cannot understand some parts of this code..but, it works! — Ishwarya Balasubramanian, Apr 08 '21 at 01:15

Using Counter() function in python

2 Answers2