Convert Pandas Dataframe to nested json-keep 2 columns

Question

I have a DF with the following columns and data:

enter image description here

I hope it could be converted to two columns, studentid and info, with the following format.

enter image description here

the dataset is """

studentid   course  teacher grade   rank
1   math    A   91  1
1   history B   79  2
2   math    A   88  2
2   history B   83  1
3   math    A   85  3
3   history B   76  3

and the desire output is

studentid   info
1   "{""math"":[{""teacher"":""A"",""grade"":91,""rank"":1}],
""history"":[{""teacher"":""B"",""grade"":79,""rank"":2}]}"
2   "{""math"":[{""teacher"":""A"",""grade"":88,""rank"":2}],
""history"":[{""teacher"":""B"",""grade"":83,""rank"":1}]}"
3   "{""math"":[{""teacher"":""A"",""grade"":85,""rank"":3}],
""history"":[{""teacher"":""B"",""grade"":76,""rank"":3}]}"

df1=df.groupby('studentid').apply(lambda x: x[['course','teacher','grade','rank']].to_dict(orient='records')) — K.Liang, Aug 09 '22 at 10:07
Please post the data as text, not screenshot. Is this code not working? Shouldn't it be part of the question ? — Celius Stingher, Aug 09 '22 at 10:08

score 0 · Answer 1 · answered Aug 09 '22 at 10:13

0

You don't really need groupby() and the single sub-dictionaries shouldn't really be in a list, but as value's for the nested dict. After setting the columns you want as index, with df.to_dict() you can achieve the desired output:

df = df.set_index(['studentid','course'])

df.to_dict(orient='index')

Outputs:

{(1, 'math'): {'teacher': 'A', 'grade': 91, 'rank': 1},
 (1, 'history'): {'teacher': 'B', 'grade': 79, 'rank': 2},
 (2, 'math'): {'teacher': 'A', 'grade': 88, 'rank': 2},
 (2, 'history'): {'teacher': 'B', 'grade': 83, 'rank': 1},
 (3, 'math'): {'teacher': 'A', 'grade': 85, 'rank': 3},
 (3, 'history'): {'teacher': 'B', 'grade': 76, 'rank': 3}}

answered Aug 09 '22 at 10:13

Celius Stingher

17,835
6
23
53

thanks for your reply, I just modified the post, and put the dataset and the desire output – K.Liang Aug 09 '22 at 10:25
the problem I got is i want the output be 2 columns, student column store the unique id value, and the info column store all other columns in a nested json fomat – K.Liang Aug 09 '22 at 10:29
https://stackoverflow.com/questions/63858239/spark-scala-dataframe-columns-to-nested-json – K.Liang Aug 09 '22 at 10:38
the problem I got as same as the link above, but I do not know how to convert it into python language – K.Liang Aug 09 '22 at 10:39

score 0 · Answer 2 · answered Nov 25 '22 at 13:10

Considering that the initial dataframe is df, there are various options, depending on the exact desired output.

If one wants the info column to be a dictionary of lists, this will do the work

df_new = df.groupby('studentid').apply(lambda x: x.drop('studentid', axis=1).to_dict(orient='list')).reset_index(name='info')

[Out]:

   studentid                                               info
0          1  {'course': ['math', 'history'], 'teacher': ['A...
1          2  {'course': ['math', 'history'], 'teacher': ['A...
2          3  {'course': ['math', 'history'], 'teacher': ['A...

If one wants a list of dictionaries, then do the following

df_new = df.groupby('studentid').apply(lambda x: x.drop('studentid', axis=1).to_dict(orient='records')).reset_index(name='info')

[Out]:

   studentid                                               info
0          1  [{'course': 'math', 'teacher': 'A', 'grade': 9...
1          2  [{'course': 'math', 'teacher': 'A', 'grade': 8...
2          3  [{'course': 'math', 'teacher': 'A', 'grade': 8...

Convert Pandas Dataframe to nested json-keep 2 columns

2 Answers2