Python Counter() function to count words in documents with more then one occurrence

Question

I am working on an NLP (Natural Language Processing) project where I used the Python Counter() function from collections library. I am getting the results in the following form:

OUTPUT:

Counter({'due': 23, 'support': 20, 'ATM': 16, 'come': 12, 'case': 11, 'Sallu': 10, 'tough,': 9, 'team': 8, 'evident': , 'likely': 6, 'rupee': 4, 'depreciated': 2, 'senior': 1, 'neutral': 1, 'told': 1, 'tour\n\nRussia’s': 1, 'Vladimir': 1, 'indeed,': 1, 'welcome,”': 1, 'player': 1, 'added': 1, 'Games,': 1, 'Russia': 1, 'arrest': 1, 'system.\nBut': 1, 'rate': 1, 'Tuesday': 1, 'February,': 1, 'idea': 1, 'ban': 1, 'data': 1, 'consecutive': 1, 'interbank': 1, 'man,': 1, 'involved': 1, 'aggressive': 1, 'took': 1, 'sure': 1, 'market': 1, 'custody': 1, 'gang.\nWithholding': 1, 'cricketer': 1})

The problem is, I want to extract the words having count more than 1. In other words, I am trying to get only those words whose count is greater than 1 or 2.

I want to use the output to make a vocabulary list after reducing the words with low frequency.

PS: I have more than 100 documents to test my data with almost 2000 distinct words.

PPS: I have tried everything to get the results but unable to do so. I only need a logic and will be able to implement.

Your question formatting is really bad, no body will understand what you are asking. — onetwo12, Apr 20 '18 at 12:29
I have updated the question format. I hope it is understandable now. — Muhammad Sulaman Toor, Apr 20 '18 at 12:34

score 5 · Accepted Answer · answered Apr 20 '18 at 12:34

You can iterate over the key, value pairs in the dict and add them to a separate list. This is just that you wanted to produce a list in the end, otherwise @jpp has the better solution.

from collections import Counter

myStr = "This this this is really really good."
myDict = Counter(myStr.split())

myList = [k for k, v in myDict.items() if v > 1]

# ['this', 'really']

score 0 · Answer 2 · answered Apr 20 '18 at 12:31

0

You can use a dictionary comprehension to limit your Counter items to words with more than 1 count:

from collections import Counter

c = Counter({'due': 23, 'support': 20, 'ATM': 16, 'come': 12, 'Russia': 1, 'arrest': 1})

res = Counter({k: v for k, v in c.items() if v > 1})

# Counter({'ATM': 16, 'come': 12, 'due': 23, 'support': 20})

answered Apr 20 '18 at 12:31

jpp

159,742
34
281
339

1

Thank you so much. It worked. I wasn't familiar much with this Counter function as I never used it. :) – Muhammad Sulaman Toor Apr 20 '18 at 12:40

Python Counter() function to count words in documents with more then one occurrence

2 Answers2