1

I have this sample code for python Counter.

from collections import Counter

lst = ['item', 'itemm', 'iitem', 'foo', 'bar'] 
c = Counter(lst) 
Counter({'bar': 1, 'foo': 1, 'iitem': 1, 'item': 1, 'itemm': 1})

If I do c['item'] I get 1, but I want to get 3 due to the typos in the list.

I tried the following, it doesn't give me 3 but I still work with it:

import re

for word in lst:
    if re.search('item',word):
        print(word,c[word])

item 1
itemm 1
iitem 1

Is there a more efficent way to do it without looping through list?

Leb
  • 15,483
  • 10
  • 56
  • 75

2 Answers2

4

You may use list_comprehension along with sum

>>> d = {'bar': 1, 'foo': 1, 'iitem': 1, 'item': 1, 'itemm': 1}
>>> sum([d[i] for i in d.keys() if re.search(r'item', i)])
3

or

Without regex,

>>> sum([d[i] for i in d.keys() if 'item' in  i])
3
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
0

Let me give a few more details of solving an approximate matching of strings (this is the underlying problem here).

Orthographical errors are can be matched using edit distance check (or so called Levenshtein distance metric). It can be calculated using python-Levenshtein package:

from Levenshtein import distance
edit_dist = distance("ah", "aho")

The example is taken from a question on SO referring to this particular module.

Another reference for fuzzy string matching in Python.

sophros
  • 14,672
  • 11
  • 46
  • 75