regex match item in python Counter

Question

I have this sample code for python Counter.

from collections import Counter

lst = ['item', 'itemm', 'iitem', 'foo', 'bar'] 
c = Counter(lst) 
Counter({'bar': 1, 'foo': 1, 'iitem': 1, 'item': 1, 'itemm': 1})

If I do c['item'] I get 1, but I want to get 3 due to the typos in the list.

I tried the following, it doesn't give me 3 but I still work with it:

import re

for word in lst:
    if re.search('item',word):
        print(word,c[word])

item 1
itemm 1
iitem 1

Is there a more efficent way to do it without looping through list?

score 4 · Accepted Answer · answered Nov 05 '15 at 18:20

4

You may use list_comprehension along with sum

>>> d = {'bar': 1, 'foo': 1, 'iitem': 1, 'item': 1, 'itemm': 1}
>>> sum([d[i] for i in d.keys() if re.search(r'item', i)])
3

or

Without regex,

>>> sum([d[i] for i in d.keys() if 'item' in  i])
3

answered Nov 05 '15 at 18:20

Avinash Raj

172,303
28
230
274

Thank you, the one without regex is much faster. – Leb Nov 05 '15 at 18:31

score 0 · Answer 2 · answered Jul 26 '19 at 16:07

Let me give a few more details of solving an approximate matching of strings (this is the underlying problem here).

Orthographical errors are can be matched using edit distance check (or so called Levenshtein distance metric). It can be calculated using python-Levenshtein package:

from Levenshtein import distance
edit_dist = distance("ah", "aho")

The example is taken from a question on SO referring to this particular module.

Another reference for fuzzy string matching in Python.

regex match item in python Counter

2 Answers2