0

I have a corpus of text containing sentences. I wish to count the number of occurrences of each word and avoid adding any word more than once (e.g. Multiple occurrences of ',' must be added once to return something like ',': 2047)

Desired output:'partner': 7, 'meetings': 7, '14': 7, 'going': 7,etc. I realize that I need to use a set() to avoid duplicates. But I don't know how. Currently, I am avoiding adding elements that are already in the list by saying append only if not already in occurrences

This however isn't working as I am getting ',':2047 multiple times in the result.

I am avoiding list comprehensions in the sample code to increase reader's comprehension! :P

Counting occurrences of words[i] in words

occurrences = []
for i in range(1, words.__len__() - 1):
    if words[i-1] not in occurrences:
        occurrences.append((words[i - 1], words.count(words[i - 1])))
print(occurrences)
Ketcomp
  • 434
  • 6
  • 20

2 Answers2

1

Use collections.Counter:

word_count = Counter(words)
Steven Rumbalski
  • 44,786
  • 9
  • 89
  • 119
0

According to this answer here I should use Counter() like so:

from collections import Counter
ctr = Counter()
    for word in words:
        ctr[word] += 1
    print(ctr)
Community
  • 1
  • 1
Ketcomp
  • 434
  • 6
  • 20