Convert PANDAS dataframe to nested JSON + add array name (Modification)

Question

I'm trying to create a nested Json file from a pandas dataframe. I found a similar question here but when I tried to apply the answer, the output wasn't what I really wanted. I tried to adjust the code to get the desired answer but I haven't been able to.

Let me explain the problem first then I will sow you what I have done so far.

I have the following dataframe:

Region  staff_id    rate    dep
1       300047      77      4
1       300048      45      3
1       300049      32      7
2       299933      63      8
2       299938      86      7

Now I want the json object to look like this:

{'region': 1 : 
   { 'Info': [
      {'ID': 300047, 'Rate': 77, 'Dept': 4},
      {'ID': 300048, 'Rate': 45, 'Dept': 3},
      {'ID': 300049, 'Rate': 32, 'Dept': 7}
      ]
   },
   'region': 2 : 
   { 'Info': [
      {'ID': 299933, 'Rate': 63, 'Dept': 8},
      {'ID': 299938, 'Rate': 86, 'Dept': 7}
      ]
   }
}

So for every region, there is a tag called info and inside info there is all the rows of that region.

I tried this code from the previous answer:

json_output = list(df.apply(lambda row: {"region": row["Region"],"Info": [{
                                         "ID": row["staff_id"], "Rate": row["rate"], "Dept": row["dep"]}]
                                    },
                        axis=1).values)

Which will give me every row in the dataframe and not grouped by the region.

Sorry because this seems repetitive, but I have been trying to change that answer to fit mine and I would really appreciate your help.

Can you try looping over a groupby for Region? See https://stackoverflow.com/questions/43826182/how-to-iterate-over-pandas-dataframegroupby-and-select-all-entries-per-grouped-v for how — Nick ODell, Mar 31 '22 at 01:21

score 1 · Accepted Answer · answered Mar 31 '22 at 12:49

As mention by Nick ODell, you can loop through the group by element

df = pd.DataFrame({"REGION":[1,1,1,2,2], 
               "staff_id": [1,2,3,4,5],
              "rate": [77,45,32,63,86],
              "dep":[4,3,7,8,7]})
desired_op = []
grp_element  = list(df.groupby(["REGION"]))
for i in range(len(grp_element)):
    empty_dict = {} # this dict will store data according to Region
    lst_info = eval(grp_element[i][1][["staff_id","rate","dep"]].to_json(orient='records')) # converting to Json output of grouped data
    empty_dict["REGION"] = grp_element[i][1]["REGION"].values[0] # to get Region number
    empty_dict['info'] = lst_info
    desired_op.append(empty_dict)

print(desired_op)
[{'REGION': 1,
  'info': [{'staff_id': 1, 'rate': 77, 'dep': 4},
           {'staff_id': 2, 'rate': 45, 'dep': 3},
           {'staff_id': 3, 'rate': 32, 'dep': 7}]},
 {'REGION': 2,
  'info': [{'staff_id': 4, 'rate': 63, 'dep': 8},
           {'staff_id': 5, 'rate': 86, 'dep': 7}]}]

Convert PANDAS dataframe to nested JSON + add array name (Modification)

1 Answers1