0

I have some code that works well computing term frequency on a chosen list using the counter class import.

from collections import Counter

terms=['the', 'fox', 'the', 'quick', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']

tf = Counter(terms)

print(tf)

The existing code works great but I am wondering what would be the leanest way to achieve the same result strictly using a bag/multiset ADT without the help of the python counter class.

I have spent several days experimenting with code and looking on other forums without much success.

Syntax Killer
  • 113
  • 2
  • 10

2 Answers2

2

You can use a single dictionary comprehension:

terms=['the', 'fox', 'the', 'quick', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']
new_terms = {term:terms.count(term) for term in terms}

Output:

{'lazy': 1, 'over': 1, 'fox': 2, 'dog': 1, 'quick': 1, 'the': 3, 'jumps': 1}

using the multiset:

import itertools
import multiset
final_data = [multiset.Multiset(list(b)) for a, b in itertools.groupby(sorted(terms))]

Output:

[Multiset({'dog': 1}), Multiset({'fox': 2}), Multiset({'jumps': 1}), Multiset({'lazy': 1}), Multiset({'over': 1}), Multiset({'quick': 1}), Multiset({'the': 3})]
Ajax1234
  • 69,937
  • 8
  • 61
  • 102
0

You can use a common dict and loop over the terms and update counts using get with a default value:

tf = {}
for t in terms:
    tf[t] = tf.get(t, 0) + 1
user2390182
  • 72,016
  • 6
  • 67
  • 89