1

I have a dictionary that resembles the following:

dict1 = {'key1':['1','2','3'],'key2':['3','4','5'],'key3':['6','7','8']}

I would like to merge all keys that have at least one common element and as a result. For example, the resulting dictionary should look like:

dict1 = {'key1':['1','2','3','4','5'],'key3':['6','7','8']}

Please note how key2 has been eliminated. Whether it is key1 or key2 that is eliminated does not matter. I have only gotten as far as being able to identify repeats, but not how to merge them in an iterative fashion. Thanks

Burger King
  • 2,945
  • 3
  • 20
  • 45
Vince
  • 235
  • 6
  • 19

3 Answers3

2

Would that work for you? Please note that since the order of elements in the dictionary is arbitrary, you cannot guarantee which keys will end up being inserted into the output dictionary.

dict_out = {}
processed = set()
for k1, v1 in dict_in.items():
    if k1 not in processed:
        processed.add(k1)
        vo = v1
        for k2, v2 in dict_in.items():
            if k2 not in processed and set(v1) & set(v2):
                vo = sorted(list(set(vo + v2)))
                processed.add(k2)
        dict_out[k1] = vo

This for:

dict_in = {'key1': ['1', '2', '3'], 'key2': ['3', '4', '5'], 'key3': ['6', '7', '8']}

gives:

{'key1': {'1', '2', '3', '4', '5'}, 'key3': ['6', '7', '8']}

And for:

dict_in = {'key1': ['1', '2', '3'], 'key2': ['3', '4', '5'],
           'key3': ['6', '7', '8'], 'key4': ['7', '9']}

gives:

{'key1': {'1', '2', '3', '4', '5'}, 'key3': {'6', '7', '8', '9'}}

And finally, for:

dict_in = {'key1': ['1', '2', '3'], 'key2': ['3', '4', '5'],
           'key3': ['6', '7', '8'], 'key4': ['5', '6', '7']}

it gives:

{'key1': {'1', '2', '3', '4', '5'}, 'key3': {'5', '6', '7', '8'}}

EDIT

OP requested that even outcomes of merges should be merged with each other. To achieve that, we can wrap the code above in a loop like this:

d = dict_in
processed = set([None])
while processed:
    dict_out = {}
    processed = set()
    for k1, v1 in d.items():
        if k1 not in processed:
            vo = v1
            for k2, v2 in d.items():
                if k1 is not k2 and set(vo) & set(v2):
                    vo = sorted(list(set(vo + v2)))
                    processed.add(k2)
            dict_out[k1] = vo
    d = dict_out

Then, for:

dict_in = {'key1': ['1', '2', '3'], 'key2': ['3', '4', '5'],
           'key3': ['6', '7', '8'], 'key4': ['5', '6', '7']}

we get:

{'key4': ['1', '2', '3', '4', '5', '6', '7', '8']}

and for:

dict_in = {'key1': ['1', '2', '3'], 'key2': ['3', '4', '5'],
           'key3': ['4', '6', '7'], 'key4': ['8', '9']}

we get:

{'key1': ['1', '2', '3', '4', '5', '6', '7'], 'key4': ['8', '9']}
Andrzej Pronobis
  • 33,828
  • 17
  • 76
  • 92
  • try `dict_in = {'key1': ['1', '2', '3'], 'key2': ['3', '4', '5'], 'key3': ['6', '7', '8'],'key4': ["9","10"]}` – Padraic Cunningham May 29 '15 at 21:33
  • Gives : `{'key1': ['1', '2', '3', '4', '5'], 'key3': ['6', '7', '8'], 'key4': ['9', '10']}`. Isn't that correct? – Andrzej Pronobis May 29 '15 at 21:35
  • This is great for merging once, but a previously merged key/value pair cannot merge with others. For example: dict_in = {'key1':['1','2','3'],'key2':['3','4','5'],'key3':['4','6','7'],'key4':['8','9']} This gives: {'key3': set(['8', '3', '5', '4', '7']), 'key1': ['1', '2', '3'], 'key4': ['9', '10']} which is not right since key1 and key3 both have 3 in them. So close! – Vince May 29 '15 at 21:42
  • That behavior was intentional :) I thought that this is what you actually need. Small edit will fix it. – Andrzej Pronobis May 29 '15 at 21:48
1

If you want to change the original dict you will need to copy:

vals = {k: set(val) for k, val in dict1.items()}

for key, val in dict1.copy().items():
    for k, v in vals.copy().items():
        if k == key:
            continue
        if v.intersection(val):
            union = list(v.union(val))
            dict1[key] = union
            del vals[k]
            del dict1[k]

If you want to union all:

vals = {k: set(val) for k, val in dict1.items()}
unioned = set()
srt = sorted(dict1.keys())
srt2 = srt[:]
for key in srt:
    for k in srt2:
        if k == key:
            continue
        if vals[k].intersection(dict1[key]) and key not in unioned:
            unioned.add(k)
            dict1[key] = list(vals[k].union(dict1[key]))
            srt2.remove(k)

for k in unioned:
    del dict1[k]
Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
  • I like this answer, works perfectly. Andrzej was first though... Thanks for your efforts and assistance. – Vince May 29 '15 at 22:07
  • @Vincem no worries, I was not totally sure of what should happen when you have updated values, I added another way that handles that – Padraic Cunningham May 29 '15 at 22:09
0

I have a more compact method.

I think it's more readable and easy to understand. You can refer as below:

dict1 = {'key1':['1','2','3'],'key2':['3','4','5'],'key3':['6','7','8']}

# Index your key of dict
l = list(enumerate(sorted(dict1.keys())))

# nested loop
for i in xrange(len(dict1)):
    for j in xrange(i+1,len(dict1)):
        i_key, j_key = l[i][1], l[j][1]
        i_value, j_value = set(dict1[i_key]), set(dict1[j_key])
        # auto detect: if the values have common element to do union
        if i_value & j_value:
            union_list = sorted(list(i_value | j_value))
            dict1[i_key] = union_list
            del dict1[j_key]

print dict1
#{'key3': ['6', '7', '8'], 'key1': ['1', '2', '3', '4', '5']}
Burger King
  • 2,945
  • 3
  • 20
  • 45