3

So, instead of trying to explain things first, I will just show you what I have and what I want (this is easier):

What I have:

dict_list = [
    {'some': 1.2, 'key': 1.3, 'words': 3.9, 'label': 0},
    {'other': 1.2, 'wordly': 1.3, 'words': 3.9, 'label': 1},
    {'other': 10, 'work': 1.3, 'like': 3.9, 'label': 1},
]

What I want to get from what I have:

dict_dict = { "0":{'some': 1.2, 'key': 1.3, 'words': 3.9},
              "1":{'other': 10, 'wordly': 1.3, 'work': 1.3, 'like': 3.9, 'words': 3.9},
}

Explanation:

So, I want to create a dictionary by using the "label" keys as the main keys in that new dictionary. I also need to merge dictionaries that have the same label. During this merging, I need to keep the highest value if there is a duplicate key (as the "other" key in the example).

Why don't I do all of this before I create the original list of dicts?

Because dict_list is a result of a joblib (multiprocessing) process. Sharing some objects between processes slowing down the multiprocessing. So, instead of sharing, I have decided to run the heavy work on multiple cores and then do the organizing after. I am not sure if this approach will be any helpful but I can't know without testing.

MehmedB
  • 1,059
  • 1
  • 16
  • 42

2 Answers2

1

Counter module has nice merging feature a|b which joins the dictionaries keeping the higher values.

from collections import Counter
dict_dict = {}
for dictionary in dict_list:
    label = str(dictionary.pop('label'))
    dict_dict[label] = dict_dict.get(label,Counter())|Counter(dictionary)

###If you don't need Counters, just convert back to dictionaries
dict_dict = {i:dict(v) for i,v in dict_dict.items()}
Himanshu Sheoran
  • 1,266
  • 1
  • 6
  • 5
  • This seems like it returns what I want. I have some concerns about the memory usage of it. I need to do some tests. Thank you! – MehmedB Oct 01 '20 at 08:14
  • Is there a straightforward way to do this without generating a new dictionary? – MehmedB Oct 01 '20 at 08:19
  • 1
    `a|b` i.e keeping the higher value for same keys is not available in python yet (it will be released in python3.9 so you wont need Counter). You would otherwise need to generate atleast one new dictionary. And updating it with maximum value by iterating over keys. I wrote this just for ease of programming – Himanshu Sheoran Oct 02 '20 at 02:36
0

easy pisy:

dict_of_dicts = {i:item for  i,item in enumerate(list_of_dicts)}

if u insist on strings in the keys:

dict_of_dicts = {str(i):item for  i,item in enumerate(list_of_dicts)}
adir abargil
  • 5,495
  • 3
  • 19
  • 29