Expand a dict containing list items into a list of dict pairs

Question

If I have a dictionary containing lists in one or more of its values:

data = {
  'a':0,
  'b':1,
  'c':[0, 1, 2],
  'pair':['one','two']
}

How can I get a list of dict tuples paired by pair and iterating over c, with all else remaining constant? E.g.

output = [
    ({
        'a':0,
        'b':1,
        'c':0,
        'pair':'one'
    },
    {
        'a':0,
        'b':1,
        'c':0,
        'pair':'two'
    }),
    ({
        'a':0,
        'b':1,
        'c':1,
        'pair':'one'
    },
    ...
]

Do you know the keys of the values that you want to “expand” in advance? Do I understand you correctly that the resulting list is akin to a Cartesian product of the expanded values? — David Foerster, Jun 24 '18 at 09:52
Your idea of expanding a dict would make an interesting question. This question was specifically about getting "a list of dict tuples paired by `pair` and iterating over `c`, with all else remaining constant" given the starting dict. — rer, Jun 24 '18 at 16:39
I honestly think you want to try solve a problem with a wrong solution. Maybe post the input and desired result and describe the problem area. — Petr Szturc, Jun 27 '18 at 13:57

score 6 · Accepted Answer · edited Jun 24 '18 at 02:48

6

Well, this doesn't feel especially elegant, but you might use a nested for loop or list comprehension:

output = []
for i in data['c']:
  output.append(tuple({'a': 0, 'b': 1, 'c': i, 'pair': p} for p in data))

or

output = [tuple({'a': 0, 'b': 1, 'c': i, 'pair': p} for p in data['pair']) for i in data['c']]

A cleaner solution might separate out the generation of the component dict into a function, like this:

def gen_output_dict(c, pair):
  return {'a': 0, 'b': 1, 'c': c, 'pair': pair}

output = []
for i in data['c']:
  output.append(tuple(gen_output_dict(i, p) for p in data['pair']))

edited Jun 24 '18 at 02:48

rer

1,198
2
13
24

answered Jun 24 '18 at 02:15

davidshere

315
3
10

1

What if the OP has tons of keys and values, he has to go trough the whole dictionary and find the keys that the value is a type of list – U13-Forward Jun 24 '18 at 02:17
You know what @U8-Forward, you're probably right that OP was asking about a more general case of iterating over the lists. – davidshere Jun 24 '18 at 02:21

Olivier Melançon · Answer 2 · 2018-06-24T02:39:49.907

4

You can use itertools.product on list values and keep track of the key from which each element originated. Since the key 'pair' has a special meaning, you should treat it separately.

Code

from itertools import product

def unzip_dict(d):
    keys = [k for k, v in d.items() if isinstance(v, list) and k != 'pair']
    values = [d[k] for k in keys]

    for values in product(*values):
        yield tuple({**d, **dict(zip(keys, values)), 'pair': pair} for pair in d['pair'])

Example

data = {
    'a': 0,
    'c': [1, 2],
    'pair': ['one', 'two']
}

print(*unzip_dict(data))

Output

({'a': 0, 'c': 1, 'pair': 'one'}, {'a': 0, 'c': 1, 'pair': 'two'})
({'a': 0, 'c': 2, 'pair': 'one'}, {'a': 0, 'c': 2, 'pair': 'two'})

edited Jun 24 '18 at 02:39

answered Jun 24 '18 at 02:16

Olivier Melançon

21,584
4
41
73

1

This was my instinct too, but then I noticed that they have a requirement about pairing items in tuples inside the list based on the `pair` key's value (in fact, based on the `c` keys value). You could still use itertools product, but it'd be a bit less general. – jedwards Jun 24 '18 at 02:20
@jedwards There is the udpated version thanks for the heads up. – Olivier Melançon Jun 24 '18 at 02:34

Abdou · Answer 3 · 2018-06-24T04:05:16.877

The following is quite an extended solution:

data = {
  'a':0,
  'b':1,
  'c':[0, 1, 2],
  'pair':['one','two']
}

# Get the length of the longest sequence
length = max(map(lambda x: len(x) if isinstance(x, list) else 1, data.values()))

# Loop through the data and change scalars to sequences
# while also making sure that smaller sequences are stretched to match
# or exceed the length of the longest sequence
for k, v in data.items():
    if isinstance(v, list):
        data[k] = v * int(round(length/len(v), 0))
    else:
        data[k] = [v] * length

# Create a dictionary to keep track of which outputs
# need to end up in which tuple
seen = dict.fromkeys(data.get('pair'), 0)
output = [tuple()] * len(seen)

# Loop through the data and place dictionaries in their
# corresponding tuples.
for v in zip(*data.values()):
        d = dict(zip(data, v))
        output[seen[d.get('pair')]] += (d,)
        seen[d.get('pair')] += 1

print(output)

The idea is to convert the scalars in your data to sequences whose lengths match that of the longest sequence in the original data. Therefore, the first thing I did was assign to the variable length the size of the longest sequence. Armed with that knowledge, we loop through the original data and extend the already existing sequences to match the size of the longest sequence while converting scalars to sequences. Once that's done, we move to generating the output variable. But first, we create a dictionary called seen to help us both create a list of tuples and keep track of which group of dictionaries ends up in which tuple. This, then, allows us to run one final loop to place the groups of dictionaries to their corresponding tuples.

The current output looks like the following:

[({'a': 0, 'b': 1, 'c': 0, 'pair': 'one'},
  {'a': 0, 'b': 1, 'c': 1, 'pair': 'two'}),
 ({'a': 0, 'b': 1, 'c': 2, 'pair': 'one'},)]

Please let me know if you need any more clarifying details. Otherwise, I do hope this serves some purpose.

score 1 · Answer 4 · answered Jun 24 '18 at 05:45

@r3robertson, You can also try the below code. The code is based on the concept of list comprehension, & deepcopy() operation in Python.

Check Shallow copy vs deepcopy in Python.

import pprint;
import copy;

data = {
    'a': 0,
    'b': 1,
    'c': [0, 1, 2],
    'pair': ['one','two'],
};

def get_updated_dict(data, index, pair_name):
    d = copy.deepcopy(data);
    d.update({'c': index, 'pair': pair_name});
    return d;

output = [tuple(get_updated_dict(data, index, pair_name) for pair_name in data['pair']) for index in data['c']];

# Pretty printing the output list.
pprint.pprint(output, indent=4);

Output »

[   (   {   'a': 0, 'b': 1, 'c': 0, 'pair': 'one'},
        {   'a': 0, 'b': 1, 'c': 0, 'pair': 'two'}),
    (   {   'a': 0, 'b': 1, 'c': 1, 'pair': 'one'},
        {   'a': 0, 'b': 1, 'c': 1, 'pair': 'two'}),
    (   {   'a': 0, 'b': 1, 'c': 2, 'pair': 'one'},
        {   'a': 0, 'b': 1, 'c': 2, 'pair': 'two'})]

Pretty printing using json module »

Note: Tuple will convert into list here as tuples are not supported inside JSON.

import json;
print(json.dumps(output, indent=4));

Output »

[
    [
        {
            "a": 0,
            "c": 0,
            "b": 1,
            "pair": "one"
        },
        {
            "a": 0,
            "c": 0,
            "b": 1,
            "pair": "two"
        }
    ],
    [
        {
            "a": 0,
            "c": 1,
            "b": 1,
            "pair": "one"
        },
        {
            "a": 0,
            "c": 1,
            "b": 1,
            "pair": "two"
        }
    ],
    [
        {
            "a": 0,
            "c": 2,
            "b": 1,
            "pair": "one"
        },
        {
            "a": 0,
            "c": 2,
            "b": 1,
            "pair": "two"
        }
    ]
]

score 0 · Answer 5 · edited Jun 20 '20 at 09:12

0

Not too perfect but here's my solution.

data = { 'a':0, 'b':1, 'c':[0, 1, 2], 'pair':['one','two'] }
a,b = data['pair'], data['c']
for t in range(0, len(b)):
  for u in range(0, len(a)):
    for h in a:
        data['c']=b[t]
        data['pair']=a[u]
    print(tuple([data]))

edited Jun 20 '20 at 09:12

Community

1
1

answered Jun 24 '18 at 03:29

Nomeh Uchenna Gabriel

104
10

score 0 · Answer 6 · answered Jun 07 '22 at 07:53

If you are sure about defined keys then you can use like below

record_dict = {'id':'0123abc', 'roles': ['abc', 'cda', 'xyz']} output = []

for index, key in enumerate(record_dict['roles']): output.append({'id': record_dict.get('id'), 'roles': key}) print(output)

Ajax1234 · Answer 7 · 2018-06-24T02:36:25.173

You can use itertools:

import itertools
data = {
  'a':0,
  'b':1,
  'c':[0, 1, 2],
  'pair':['one','two']
}
def expand_dict(data):
   grouped = [a for a, b in data.items() if isinstance(b, list)]
   p = [[a, list(b)] for a, b in itertools.groupby(itertools.product(*[data[i] for i in grouped]), key=lambda x:x[0])]
   return [tuple({**data, **dict(zip(grouped, i))} for i in c) for _, c in p]

print(expand_dict(data))

Output:

[({'a': 0, 'b': 1, 'c': 0, 'pair': 'one'}, {'a': 0, 'b': 1, 'c': 0, 'pair': 'two'}), 
 ({'a': 0, 'b': 1, 'c': 1, 'pair': 'one'}, {'a': 0, 'b': 1, 'c': 1, 'pair': 'two'}), 
 ({'a': 0, 'b': 1, 'c': 2, 'pair': 'one'}, {'a': 0, 'b': 1, 'c': 2, 'pair': 'two'})]

This solution will also work on input with many possible lists of values:

data = {'a':[5, 6, 1, 3], 'b':1, 'c':[0, 1, 2], 'pair':['one', 'two']}
print(expand_dict(data))