2

I have a csv similar to below and wanted transpose as JSON

Input:

Output I expect to get:

{
"C1" :[ {"header2" : "name1", "header 3" : "address 1"}, {"header2" : "name3", "header 3" : "address 3"}],
"C2" : [ {"header2" : "name2", "header 3" : "address 2"}]
}
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
JackJack
  • 75
  • 6

1 Answers1

2
  • Based on some comments, some people are just pandas haters. But I like to use the tool that allows me to solve the problem in the easiest manner possible, and with the fewest lines of code.
    • In this case, without a doubt, that's pandas
    • An added benefit of using pandas, is the data can easily be clean, analyzed , and visualized, if needed.
    • Solutions at How to convert CSV file to multiline JSON? offer some basics, but won't help transform the csv into the required shape.
  • Because of the expected output of the JSON file, this is a non-trivial question, which requires reshaping/grouping the data in the csv and is easily accomplished with pandas.DataFrame.groupby.
    • groupby 'h1' since the column values will be the dict outer keys
    • groupby returns a DataFrameGroupBy object that can be split into, i, the value used to create the group ('c1' and 'c2' in this case) and the associated dataframe group, g.
import json
import pandas as pd

# read the file
df = pd.DataFrame('test.csv')

# display(df)
   h1  h2  h3
0  c1  n1  a1
1  c2  n2  a2
2  c1  n3  a3

# groupby and create dict
data_dict = dict()
for i, g in df.groupby('h1'):
    data_dict[i] = g.drop(columns=['h1']).to_dict(orient='records')

# print(data_dict)
{'c1': [{'h2': 'n1', 'h3': 'a1'}, {'h2': 'n3', 'h3': 'a3'}],
 'c2': [{'h2': 'n2', 'h3': 'a2'}]}

# save data_dict to a file as a JSON
with open('result.json', 'w') as fp:
    json.dump(data_dict, fp)

JSON file

{
    "c1": [{
            "h2": "n1",
            "h3": "a1"
        }, {
            "h2": "n3",
            "h3": "a3"
        }
    ],
    "c2": [{
            "h2": "n2",
            "h3": "a2"
        }
    ]
}
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
  • 1
    I'm [far from a Pandas hater](https://stackoverflow.com/search?tab=votes&q=user%3a1426065%20pandas), I just (mistakenly) though it would be just as easy and less resource-intensive to use the `csv` and `json` modules directly. However, I didn't take into account the data munging that had to take place to give the results the OP was after. Have an upvote. – MattDMo Jul 10 '20 at 18:46