- Based on some comments, some people are just
pandas
haters. But I like to use the tool that allows me to solve the problem in the easiest manner possible, and with the fewest lines of code.
- In this case, without a doubt, that's
pandas
- An added benefit of using
pandas
, is the data can easily be clean, analyzed , and visualized, if needed.
- Solutions at How to convert CSV file to multiline JSON? offer some basics, but won't help transform the csv into the required shape.
- Because of the expected output of the
JSON
file, this is a non-trivial question, which requires reshaping/grouping the data in the csv and is easily accomplished with pandas.DataFrame.groupby
.
groupby
'h1'
since the column values will be the dict
outer keys
groupby
returns a DataFrameGroupBy
object that can be split into, i
, the value used to create the group ('c1'
and 'c2'
in this case) and the associated dataframe group, g
.
import json
import pandas as pd
# read the file
df = pd.DataFrame('test.csv')
# display(df)
h1 h2 h3
0 c1 n1 a1
1 c2 n2 a2
2 c1 n3 a3
# groupby and create dict
data_dict = dict()
for i, g in df.groupby('h1'):
data_dict[i] = g.drop(columns=['h1']).to_dict(orient='records')
# print(data_dict)
{'c1': [{'h2': 'n1', 'h3': 'a1'}, {'h2': 'n3', 'h3': 'a3'}],
'c2': [{'h2': 'n2', 'h3': 'a2'}]}
# save data_dict to a file as a JSON
with open('result.json', 'w') as fp:
json.dump(data_dict, fp)
JSON file
{
"c1": [{
"h2": "n1",
"h3": "a1"
}, {
"h2": "n3",
"h3": "a3"
}
],
"c2": [{
"h2": "n2",
"h3": "a2"
}
]
}