Complex json log data transformation using?

Question

I am new to data science tools and have a use case to transform json logs into a flattened columnar data maybe considered as normal csv, I was looking into a lot of alternatives (tools) to approach this problem and found that I can easily solve this using Apache Spark Sql but the problem is my json log can be a complex data structure with hierarchical arrays i.e. I would have to explode the dataset multiple times to transform it.

The problem is I don't want to hard code the logic for data transformation as I wish to reuse the same chunk of code with different transformation logic, or to put it in a better way I want my transformation to be driven by configurations rather than code.

For the same reason I was looking into Apache Avro which provides me with liberty to define my own schema for the input, but here the problem is I am unaware if I can also define the output schema as well ? If not then it will be same as reading and filtering the avro data structure (generated) into my code logic.

One probable solution which I can think of is to define my schema along with the array fields and some flags to notify my parser to explode on them, which might be recursive as well till I transform the input schema into output i.e. generating the transformation logic based on my input and output schemas.

Is there any better approach which I am unaware of or not being able to think about ?

Complex json log data transformation using?

0 Answers0