Leanest way to compute term frequency without using the counter class on a bag ADT

Question

I have some code that works well computing term frequency on a chosen list using the counter class import.

from collections import Counter

terms=['the', 'fox', 'the', 'quick', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']

tf = Counter(terms)

print(tf)

The existing code works great but I am wondering what would be the leanest way to achieve the same result strictly using a bag/multiset ADT without the help of the python counter class.

I have spent several days experimenting with code and looking on other forums without much success.

By "leanest" do you mean fastest, or with least code? Or something else? — doctorlove, Dec 01 '17 at 16:31
How does that for the task at hand. A bunch of singleton multisets? A single multiset that you still must iterate to count? — user2390182, Dec 01 '17 at 16:52

Ajax1234 · Accepted Answer · 2017-12-01T16:47:35.410

2

You can use a single dictionary comprehension:

terms=['the', 'fox', 'the', 'quick', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']
new_terms = {term:terms.count(term) for term in terms}

Output:

{'lazy': 1, 'over': 1, 'fox': 2, 'dog': 1, 'quick': 1, 'the': 3, 'jumps': 1}

using the multiset:

import itertools
import multiset
final_data = [multiset.Multiset(list(b)) for a, b in itertools.groupby(sorted(terms))]

Output:

[Multiset({'dog': 1}), Multiset({'fox': 2}), Multiset({'jumps': 1}), Multiset({'lazy': 1}), Multiset({'over': 1}), Multiset({'quick': 1}), Multiset({'the': 3})]

edited Dec 01 '17 at 16:47

answered Dec 01 '17 at 16:33

Ajax1234

69,937
8
61
102

this is using dictionary tho right?, not a bag. multi set. – Syntax Killer Dec 01 '17 at 16:40
import multiset, Pycharm doesn't seem to like that. – Syntax Killer Dec 01 '17 at 16:58
@SyntaxKiller what specifically is the error message? – Ajax1234 Dec 01 '17 at 16:59
1

my apologies, it does now, took a little fit, – Syntax Killer Dec 01 '17 at 17:00

score 0 · Answer 2 · answered Dec 01 '17 at 16:37

0

You can use a common dict and loop over the terms and update counts using get with a default value:

tf = {}
for t in terms:
    tf[t] = tf.get(t, 0) + 1

answered Dec 01 '17 at 16:37

user2390182

72,016
6
67
89

Leanest way to compute term frequency without using the counter class on a bag ADT

2 Answers2