Create nested json lines from pipe delimited flat file using python

Question

I have a text file pipe delimited as below. In that file for same ID, CODE and NUM combination we can have different INC and INC_DESC

ID|CODE|NUM|INC|INC_DESC
"F1"|"W1"|1|1001|"INC1001"
"F1"|"W1"|1|1002|"INC1002"
"F1"|"W1"|1|1003|"INC1003"
"F2"|"W1"|1|1002|"INC1003"
"F2"|"W1"|1|1003|"INC1004"
"F2"|"W2"|1|1003|"INC1003"

We want to create json like below where different INC and INC_DESC should come as an array for same combination of ID, CODE and NUM

{"ID":"F1","CODE":"W1","NUM":1,"INC_DTL":[{"INC":1001, "INC_DESC":"INC1001"},{"INC":1002, "INC_DESC":"INC1002"},{"INC":1003, "INC_DESC":"INC1003"}]}
{"ID":"F2","CODE":"W1","NUM":1,"INC_DTL":[{"INC":1002, "INC_DESC":"INC1002"},{"INC":1003, "INC_DESC":"INC1003"}]}
{"ID":"F2","CODE":"W2","NUM":1,"INC_DTL":[{"INC":1003, "INC_DESC":"INC1003"}]}

I tried below but it is not generating nested as I want

import pandas as pd

Input_File=f'V:\input.dat'
df=pd.read_csv(Input_File, sep='|')

json_output=f'V:\outfile.json'
output=df.to_json(json_output, orient='records')

And what you show as expected output is **not** [valid] JSON. It is `ndjson`/`jsonlines` — buran, Sep 30 '22 at 13:57
@Vlad - The last row is having the combination of F2,W2 and 1 — Koushik Chandra, Sep 30 '22 at 14:16
@Vlad - Sorry this was a typo. I corrected in the source data now. — Koushik Chandra, Sep 30 '22 at 14:26

Ze'ev Ben-Tsvi · Answer 1 · 2022-09-30T17:18:19.153

0

import pandas as pd


# agg function
def agg_that(x):
    l = [x]
    return l


Input_File = f'V:\input.dat'
df = pd.read_csv(Input_File, sep='|')

# groupby columns
df = df.groupby(['ID', 'CODE', 'NUM']).agg(agg_that).reset_index()
# create new column
df['INC_DTL'] = df.apply(
    lambda x: [{'INC': inc, 'INC_DESC': dsc} for inc, dsc in zip(x['INC'][0], x['INC_DESC'][0])], axis=1)
# drop old columns
df.drop(['INC', 'INC_DESC'], axis=1, inplace=True)

json_output = f'V:\outfile.json'
output = df.to_json(json_output, orient='records', lines=True)

OUTPUT:

{"ID":"F1","CODE":"W1","NUM":1,"INC_DTL":[{"INC":1001,"INC_DESC":"INC1001"},{"INC":1002,"INC_DESC":"INC1002"},{"INC":1003,"INC_DESC":"INC1003"}]}
{"ID":"F1","CODE":"W2","NUM":1,"INC_DTL":[{"INC":1003,"INC_DESC":"INC1003"}]}
{"ID":"F2","CODE":"W1","NUM":1,"INC_DTL":[{"INC":1002,"INC_DESC":"INC1003"},{"INC":1003,"INC_DESC":"INC1004"}]}

edited Sep 30 '22 at 17:18

answered Sep 30 '22 at 15:18

Ze'ev Ben-Tsvi

1,174
1
3
7

It is writing as a whole json and not individual lines – Koushik Chandra Sep 30 '22 at 17:12
I've updated my answer. Just add lines=True to the to_json parameters – Ze'ev Ben-Tsvi Sep 30 '22 at 17:16
I am getting output as below: – Koushik Chandra Sep 30 '22 at 19:57
{"ID":"F1","CODE":"W1","NUM":1,"INC_DTL":[]} {"ID":"F1","CODE":"W2","NUM":1,"INC_DTL":[]} {"ID":"F2","CODE":"W1","NUM":1,"INC_DTL":[]} – Koushik Chandra Sep 30 '22 at 20:03
This is how I am getting the output – Koushik Chandra Sep 30 '22 at 20:03
did you happen to check df right after the grouping? does it group INC and INC_DESC into a list of a list? – Ze'ev Ben-Tsvi Oct 01 '22 at 05:14

Create nested json lines from pipe delimited flat file using python

1 Answers1