-2

I have a DF with the following columns and data:

enter image description here

I hope it could be converted to two columns, studentid and info, with the following format.

enter image description here

the dataset is """

studentid   course  teacher grade   rank
1   math    A   91  1
1   history B   79  2
2   math    A   88  2
2   history B   83  1
3   math    A   85  3
3   history B   76  3

and the desire output is

studentid   info
1   "{""math"":[{""teacher"":""A"",""grade"":91,""rank"":1}],
""history"":[{""teacher"":""B"",""grade"":79,""rank"":2}]}"
2   "{""math"":[{""teacher"":""A"",""grade"":88,""rank"":2}],
""history"":[{""teacher"":""B"",""grade"":83,""rank"":1}]}"
3   "{""math"":[{""teacher"":""A"",""grade"":85,""rank"":3}],
""history"":[{""teacher"":""B"",""grade"":76,""rank"":3}]}"
K.Liang
  • 1
  • 2

2 Answers2

0

You don't really need groupby() and the single sub-dictionaries shouldn't really be in a list, but as value's for the nested dict. After setting the columns you want as index, with df.to_dict() you can achieve the desired output:

df = df.set_index(['studentid','course'])

df.to_dict(orient='index')

Outputs:

{(1, 'math'): {'teacher': 'A', 'grade': 91, 'rank': 1},
 (1, 'history'): {'teacher': 'B', 'grade': 79, 'rank': 2},
 (2, 'math'): {'teacher': 'A', 'grade': 88, 'rank': 2},
 (2, 'history'): {'teacher': 'B', 'grade': 83, 'rank': 1},
 (3, 'math'): {'teacher': 'A', 'grade': 85, 'rank': 3},
 (3, 'history'): {'teacher': 'B', 'grade': 76, 'rank': 3}}
Celius Stingher
  • 17,835
  • 6
  • 23
  • 53
  • thanks for your reply, I just modified the post, and put the dataset and the desire output – K.Liang Aug 09 '22 at 10:25
  • the problem I got is i want the output be 2 columns, student column store the unique id value, and the info column store all other columns in a nested json fomat – K.Liang Aug 09 '22 at 10:29
  • https://stackoverflow.com/questions/63858239/spark-scala-dataframe-columns-to-nested-json – K.Liang Aug 09 '22 at 10:38
  • the problem I got as same as the link above, but I do not know how to convert it into python language – K.Liang Aug 09 '22 at 10:39
0

Considering that the initial dataframe is df, there are various options, depending on the exact desired output.

  • If one wants the info column to be a dictionary of lists, this will do the work

    df_new = df.groupby('studentid').apply(lambda x: x.drop('studentid', axis=1).to_dict(orient='list')).reset_index(name='info')
    
    [Out]:
    
       studentid                                               info
    0          1  {'course': ['math', 'history'], 'teacher': ['A...
    1          2  {'course': ['math', 'history'], 'teacher': ['A...
    2          3  {'course': ['math', 'history'], 'teacher': ['A...
    
  • If one wants a list of dictionaries, then do the following

    df_new = df.groupby('studentid').apply(lambda x: x.drop('studentid', axis=1).to_dict(orient='records')).reset_index(name='info')
    
    [Out]:
    
       studentid                                               info
    0          1  [{'course': 'math', 'teacher': 'A', 'grade': 9...
    1          2  [{'course': 'math', 'teacher': 'A', 'grade': 8...
    2          3  [{'course': 'math', 'teacher': 'A', 'grade': 8...
    
Gonçalo Peres
  • 11,752
  • 3
  • 54
  • 83