Python: Count of occurrences in dict from another list

Question

I am trying to count the number of times a word exists in a dict column based on a subset of interested words.

First I import my data

products = graphlab.SFrame('amazon_baby.gl/')
products['word_count'] = graphlab.text_analytics.count_words(products['review'])
products.head(5)

Data can be found here: https://drive.google.com/open?id=0BzbhZp-qIglxM3VSVWRsVFRhTWc

I then create list of words i am interested in:

words = ['awesome', 'great', 'fantastic']

I would like to count the number of times each word in "words" occurs in the products['word_count'].

I am not married to using graphlab. It was just suggested to me by a colleague.

Welcome to SO. We'd like to see evidence of your effort to complete your code. As is it looks like you have the bare structure and don't know how to complete it, which isn't what SO is for. Please read "[ask]" including the links and "[mcve]". I'd also recommend reading http://meta.stackoverflow.com/q/261592/128421. — the Tin Man, Jun 06 '16 at 21:52

hipoglucido · Answer 1 · 2016-06-04T13:37:06.473

Well, I am not pretty sure about what you mean by 'in a dict column'. If it is a list:

import operator
dictionary={'texts':['red blue blue','red black','blue white white','red','white','black','blue red']}
words=['red','white','blue']
freqs=dict()
for t in dictionary['texts']:
    for w in words:
        try:
             freqs[w]+=t.count(w)
        except:
            freqs[w]=t.count(w)
top_words = sorted(freqs.items(), key=operator.itemgetter(1),reverse=True)

If it is just one text:

import operator
dictionary={'text':'red blue blue red black blue white white red white black blue red'}
words=['red','white','blue']
freqs=dict()
for w in words:
    try:
        freqs[w]+=dictionary['text'].count(w)
    except:
        freqs[w]=dictionary['text'].count(w)
top_words = sorted(freqs.items(), key=operator.itemgetter(1),reverse=True)

score 1 · Answer 2 · answered Jun 04 '16 at 14:14

If you want to count occurrences of words, a fast way to do it is to use Counterobject from collections

For example :

In [3]: from collections import Counter
In [4]: c = Counter(['hello', 'world'])

In [5]: c
Out[5]: Counter({'hello': 1, 'world': 1})

Could you show the output of your products.head(5) command ?

score 1 · Answer 3 · answered Jun 06 '16 at 18:01

If you stick with graphlab (or SFrame), use the SArray.dict_trim_by_keys method. The documentation is here: https://dato.com/products/create/docs/generated/graphlab.SArray.dict_trim_by_keys.html

import graphlab as gl
sf = gl.SFrame({'review': ['what a good book', 'terrible book']})
sf['word_bag'] = gl.text_analytics.count_words(sf['review'])

keywords = ['good', 'book']
sf['key_words'] = sf['word_bag'].dict_trim_by_keys(keywords, exclude=False)
print sf

+------------------+---------------------+---------------------+
|      review      |       word_bag      |      key_words      |
+------------------+---------------------+---------------------+
| what a good book | {'a': 1, 'good':... | {'good': 1, 'boo... |
|  terrible book   | {'book': 1, 'ter... |     {'book': 1}     |
+------------------+---------------------+---------------------+ 
[2 rows x 3 columns]

Leila S · Answer 4 · 2020-09-06T05:19:27.650

0

Do you want to put each of the counts in a separate column? In that case this may work:

keywords = ['keyword1' , 'keyword2']

def word_counter(dict_cell , word):
if word in dict_cell:
    return dict_cell[word]
else:
    return 0

for words in keywords:
  df[words] = df['word_count'].apply(lambda x:word_counter(x,words))

edited Sep 06 '20 at 05:19

answered Sep 06 '20 at 05:08

Leila S

11
2

score 0 · Answer 5 · edited Mar 26 '21 at 06:22

0

def count_words(x, w):
    if w in x:
        return x.count(w)
    else:
        return 0   

selected_words = ['awesome', 'great', 'fantastic', 'amazing', 'love', 'horrible', 'bad', 'terrible', 'awful', 'wow', 'hate']

for words in selected_words:
    products[words]=products['review'].apply(lambda x:count_words(x,words))

edited Mar 26 '21 at 06:22

Suraj Rao

29,388
11
94
103

answered Mar 26 '21 at 06:00

Prashath Manorathna

1

Python: Count of occurrences in dict from another list

5 Answers5