Making Python dictionary case insensitive by aggregating keys

Question

I have a dictionary with a lot of keys that are different from each other because of the dictionary's case sensitivity. I'd like to have it all in one lower case key, but with all values of those keys aggregated.

I have something like:

>>> data
{'Blue Car': 73, 'blue Car': 21, 'yellow car': 10, 'Yellow Car': 15, 'Red Car': 12, 'Red car': 17, 'red car': 10, 'Yellow car': 18}

And the output should be like:

>>> newData
{'blue car': 94, 'yellow car': 43, 'red car': 39}

Have you made any attempts? There are tons of similar questions on StackOverflow, did you try any of those approaches? — juanpa.arrivillaga, Jan 31 '17 at 18:34
I'm sorry if this is a duplicate, I couldn't find any question that was about this specific problem, the "case insensitive dictionary" ones were not solving my problem. — Vinícius Figueiredo, Jan 31 '17 at 18:47

Abdou · Answer 1 · 2017-01-31T18:44:15.400

2

Use defaultdict:

from collections import defaultdict

newData = defaultdict(int)

for k in data:
    newData[k.lower()]+=data.get(k,0)

# {'blue car': 94, 'red car': 39, 'yellow car': 43}

I hope this helps.

edited Jan 31 '17 at 18:44

answered Jan 31 '17 at 18:34

Abdou

12,931
4
39
42

why not use `data.get(k)`? – Abdou Jan 31 '17 at 18:35
Well, it's unnecessary and will make your code a bit slower. Better yet, iterate over `d.items()`. – juanpa.arrivillaga Jan 31 '17 at 18:37
even if using `.get`, it would be more logical to do it with default value of 0 like `.get(k, 0)`. However, `.get` is absolutely not required here – Moinuddin Quadri Jan 31 '17 at 18:43
Very constructive comments. Mods made. – Abdou Jan 31 '17 at 18:45
@JoranBeasley, `defaultdict` is mainly to avoid setting the key. Otherwise you'd get a `KeyError` exception. – Abdou Jan 31 '17 at 19:11
1

Are you sure? I am using `int.__radd__`. – Abdou Jan 31 '17 at 19:17

score 1 · Answer 2 · answered Jan 31 '17 at 18:34

1

How about using a defaultdict:

from collections import defaultdict
newData = defaultdict(int)
for k,v in data.iteritems():
    newData[k.lower()] += v

answered Jan 31 '17 at 18:34

arshajii

127,459
24
238
287

score 1 · Answer 3 · answered Jan 31 '17 at 18:35

1

try this

def compress(data):
    newDict = dict()
    for key in data:
        newDict[key.lower()] = newDict.get(key.lower(), default=0) + data[key]
    return newDict

answered Jan 31 '17 at 18:35

Tom

304
4
9

Uriel · Accepted Answer · 2017-01-31T18:43:22.460

1

Using dictionaries and set comprehensions:

>>> {x: sum(v for k, v in data.items() if k.lower()==x) for x in set(map(lambda x: x.lower(), data))}
{'red car': 39, 'blue car': 94, 'yellow car': 43}

or more user friendly:

SET = set(map(lambda x: x.lower(), data))
SUM = lambda x: sum(v for k, v in data.items() if k.lower()==x)
newData = {x: SUM(x) for x in SET}

# newData = {'red car': 39, 'blue car': 94, 'yellow car': 43}

Explained:

SET = set(map(lambda x: x.lower(), data))

obtains all unique lowercase keys,

SUM = lambda x: sum(v for k, v in data.items() if k.lower()==x)

returns the sum of the values for keys in data matching the unique key, and

{x: SUM(x) for x in SET}

will match this value as a part of pair with the matching key, for every key in the set.

edited Jan 31 '17 at 18:43

answered Jan 31 '17 at 18:36

Uriel

15,579
6
25
46

Thanks for the solution! I'm still learning comprehensions, so thanks for the explanation aswell. – Vinícius Figueiredo Jan 31 '17 at 18:49
This is definitely the least readable, least efficient method posted so far. I'll note you haven't actually used a set comprehension either. You could have done `{x.lower() for x in data}`. – juanpa.arrivillaga Jan 31 '17 at 18:52

Joran Beasley · Answer 5 · 2017-01-31T19:08:02.897

I would subclass dict and override the __getitem__ and __setitem__ magic methods

class NormalizedDict(dict):
    def __getitem__(self,key):
        return dict.__getitem__(self,key.lower())
    def __setitem__(self,key,value):
        return dict.__setitem__(self,key.lower(),value)

myDict = NormalizedDict()
myDict['aPPles'] =5
print myDict

of coarse we can take this further and autosum for you

class NormalizedSumDict(NormalizedDict):
    def __setitem__(self,key,value):
        if key.lower() in self and type(self[key]) == type(value):
           try:
              value = value + self[key]
           except:
              pass
        NormalizedDict.__setitem__(self,key,value)
    def update(self,other):
        for k,v in other.items():
            self[k] = v

d = NormalizedSumDict()
d['aPPles']=5
d['Apples']=2
print d

Making Python dictionary case insensitive by aggregating keys

5 Answers5

Explained: