I have a JSON document like below:
{
"Data": [{
"Code": "ABC",
"ID": 123456,
"Type": "Yes",
"Geo": "East"
}, {
"Code": "XYZ",
"ID": 987654,
"Type": "No",
"Geo": "West"
}],
"Total": 2,
"AggregateResults": null,
"Errors": null
}
My PySpark sample code:
getjsonresponsedata=json.dumps(getjsondata)
jsonDataList.append(getjsonresponsedata)
jsonRDD = sc.parallelize(jsonDataList)
df_Json=spark.read.json(jsonRDD)
display(df_Json.withColumn("Code",explode(col("Data.Code"))).withColumn("ID",explode(col("Data.ID"))).select('Code','ID'))
When I explode the JSON then I get below records (it looks like cross join)
Code ID
ABC 123456
ABC 987654
XYZ 123456
XYZ 987654
But I expect the records like below:
Code ID
ABC 123456
XYZ 987654
Could you please help me on how to get the expected result?