0

I am trying to convert JSON data to parquet file. Below is my input.

{"time": 1637045320491, "device": {"type_id": 1}, "message": "Test message", "metadata": {"product": {"name": "prodName", "vendor_name": "XYZ"}, "version": "1.0.0", "original_time": "2021-11-16T12:18:40.491893+05:30"}, "tuid": 900201, "cluid": 9002, "activity_id": 1, "severity_id": 4, "cuid": 9}

My output does not show properly as a nested json. Instead it shows as below.

{"time":1637045320491,"device_type_id":1,"message":"Test message","metadata_product_name":"prodName","metadata_product_vendor_name":"XYZ","metadata_version":"1.0.0","metadata_original_time":24520491893000,"tuid":900201,"cluid":9002,"activity_id":1,"severity_id":4,"cuid":9}

Can someone help me get the output the same as the json. I'm using ChoETL package to convert to Parquet file.

var pqFile = @"D:\Data\" + Guid.NewGuid() + ".parquet";
using (var r = new ChoJSONReader(@"D:\Data\json-dump.json"))
{
    using (var w = new ChoParquetWriter(pqFile))
    {
         w.Write(r);
    }
}
Anand
  • 1
  • json data is hierarchical format, while parquet is tabular format. so technically they are different, ChoETL does simple flatten approach to store each json node. If that is not meeting you needs, you will have to find a appropriate way to flatten the json input into parquet format. – Cinchoo Dec 05 '22 at 14:14
  • Thank you. I tried another approach but the nested json gets stored in a string format with double quotes. Is there any way to get rid of the double quotes? Thanks again for your help. – Anand Dec 06 '22 at 03:50
  • @Anand did you find a solution for this? If so then please add an answer – nelion Aug 31 '23 at 10:42

0 Answers0