0

I think this problem is similar to the previous questions in this forum. However, I still have a problem with how to create a nested dictionary from a list of tuples.

Suppose I have the following tuple:

my_list = [
 ('actor', 'actor_id', 'integer', 'NO'),
 ('actor', 'first_name', 'character varying', 'NO'),
 ('actor_info', 'actor_id', 'integer', 'YES'),
 ('actor_info', 'first_name', 'character varying', 'YES')]

# convert into DataFrame
col = ['table', 'col_name', 'dtype', 'isnull']
df = pd.DataFrame(mylist, columns=col)

print(df)

        table     col_name                        dtype   isnull
0       actor     actor_id                      integer     NO
1       actor   first_name            character varying     NO
2  actor_info     actor_id                      integer    YES
3  actor_info   first_name            character varying    YES

Current result:

{
 'actor': {
    'actor_id': {'dtype': 'integer', 'isnull': 'NO'},
    'first_name': {'dtype': 'character varying', 'isnull': 'NO'}
    },
 'actor_info': {
    'actor_id': {'dtype': 'integer', 'isnull': 'YES'},
    'first_name': {'dtype': 'character varying', 'isnull': 'YES'}
    }
}

The expected result (it should be grouped by table_name):

{
 'actor': 
    [
        {'column': 'actor_id', 'dtype': 'integer', 'isnull': 'NO'},
        {'column': 'first_name', 'dtype': 'character varying', 'isnull': 'NO'}
    ],
 'actor_info': 
    [
        {'column': 'actor_id', 'dtype': 'integer', 'isnull': 'YES'},
        {'column': 'first_name', 'dtype': 'character varying', 'isnull': 'YES'}
    ]
}

I've tried to make a nested dictionary by converting my_list to DataFrame. However, I cannot get the desired output result. This is my current code:

# convert to nested dictionary
ff = df.iloc.groupby(['table'])[['col_name','dtype','isnull']].apply(lambda x: x.set_index('col_name').to_dict(orient='index')).to_dict()

# convert to JSON
print(json.dumps(ff, indent=1))

Could you help me to solve this kind of problem?

I'm also curious about how to solve this problem without converting to DataFrame (e.g., list comprehension, nested looping). Any help to clear this problem up would be much appreciated. Thanks

furanzup
  • 91
  • 1
  • 8
  • Your output is unclear, do you mean it should be `table_name -> list of dicts` there? Because obviously, you can't have repeating keys in a dict (`column`). Secondly, unless you really intend to use `pandas` for further manipulation, should definitely just stick to plain Python. Seems a bit overkill. – kva1966 Nov 01 '21 at 06:31
  • @kva1966 Oh sorry. Thank you for the correction and suggestion. I've edited my question (expected result). – furanzup Nov 01 '21 at 06:50
  • No problem. Just still a bit curious, do you want a list? Because if you use curly braces (`{}`) that implies a dictionary (in your current edit). If you want the column metadata to be inside a list, it should be `[ ... ]` (square brackets). Sorry to be nitpicky, it's just that your expected output is still not quite right, put that into Ipython or similar, and it'll error out, since a dictionary must be a key value pair, you have `{ val1, val2 }`. – kva1966 Nov 01 '21 at 06:55
  • @kva1966 Ah I see. Yes, I just want to return the column metadata for each column. Actually, I was a bit confused to design the output, should I put `[]` or `{}` as the output. Thankfully, you gave me the explanation. Thanks. – furanzup Nov 01 '21 at 07:02

2 Answers2

3

As per my comment, I'm assuming you'll want a list of columns metadata per table name, rather dict of dicts.

If so, this is one straightforward approach.

from collections import defaultdict

d = defaultdict(list)

for tablename, col, dtype, isnull in my_list:
    d[tablename].append({ 'column': col, 'dtype': dtype, 'isnull': isnull })

Output (in ipython):

In [19]: d
Out[19]:
defaultdict(list,
            {'actor': [{'column': 'actor_id',
               'dtype': 'integer',
               'isnull': 'NO'},
              {'column': 'first_name',
               'dtype': 'character varying',
               'isnull': 'NO'}],
             'actor_info': [{'column': 'actor_id',
               'dtype': 'integer',
               'isnull': 'YES'},
              {'column': 'first_name',
               'dtype': 'character varying',
               'isnull': 'YES'}]})

In [20]: d['actor']
Out[20]:
[{'column': 'actor_id', 'dtype': 'integer', 'isnull': 'NO'},
 {'column': 'first_name', 'dtype': 'character varying', 'isnull': 'NO'}]

In [21]: d['actor_info']
Out[21]:
[{'column': 'actor_id', 'dtype': 'integer', 'isnull': 'YES'},
 {'column': 'first_name', 'dtype': 'character varying', 'isnull': 'YES'}]
kva1966
  • 611
  • 6
  • 8
  • 1
    Thank you for the help. Much appreciated. – furanzup Nov 01 '21 at 06:58
  • Excuse me @kva1966, is it possible for me to slice/filter the data on dict `d` based on `tablename`? for example, I want to get data from `['actor','actor_info','address']` (suppose I have more than 3 tablenames). Thanks – furanzup Nov 05 '21 at 05:09
  • https://stackoverflow.com/questions/29216889/slicing-a-dictionary#47704499 Just a general tip: please search first before asking, since this was a straightforward question. Secondly, to make best use of Stackoverflow is to be very specific with questions, and have separate questions if the immediate query is solved. If the slice answer in the link is not what you had in mind, then open a new question, as you did here, with sample data and sample output. – kva1966 Nov 05 '21 at 06:02
  • Btw, if my answer solved your query, could you Accept it as the answer you were looking for? All good if it did not, or you are waiting for more. – kva1966 Nov 05 '21 at 06:05
1

Quite easy with a dictionary comprehension:

from itertools import groupby

{k: {a: dict(zip(('column', 'dtype', 'isnull'), b])) for _,a,*b in g}
 for k,g in groupby(my_list, lambda t: t[0])}

NB. groupby assumes that the initial array is sorted by the grouping key, if not you need to sort it first

Output:

{'actor': {'actor_id': {'column': 'integer', 'dtype': 'NO'},
  'first_name': {'column': 'character varying', 'dtype': 'NO'}},
 'actor_info': {'actor_id': {'column': 'integer', 'dtype': 'YES'},
  'first_name': {'column': 'character varying', 'dtype': 'YES'}}}
mozway
  • 194,879
  • 13
  • 39
  • 75
  • Hi @mozway, thanks for your help. Much appreciated. I tried your solution, but it returns only the last values for each item. For example. table `actor` has two `col_name`: `actor_id` and `first_name`, and it returns only column `first_name`. Could you please explain more about this? – furanzup Nov 01 '21 at 06:14
  • Why use slicing in this answer when you can use multiple assignment? `{k: dict(zip(('column', 'dtype', 'isnull'), v)) for k, *v in my_list}` – Jab Nov 01 '21 at 06:37
  • @furanzup sorry, I had not seen the nested level, see the updated answer – mozway Nov 01 '21 at 06:47
  • @Jab yes both are possible, I updated the answer – mozway Nov 01 '21 at 06:48
  • @mozway Wow, thanks for your answer. Much appreciated. – furanzup Nov 01 '21 at 06:51
  • @Jab Thanks for your answer. Much appreciated. – furanzup Nov 01 '21 at 06:52