I'm working on project where everyday I need to deal with tons of AVRO files. To extract the data from AVRO I use sparkSQL. To achieve this first I need to printSchema and then I need to select the fields to see the data. I want to automate this process. Given any input AVRO I want to write a script which will automatically generated SparkSQL query(considering the struct and arrays in avsc file). I'm okay to write a script in Java or Python.
-- Sample input AVRO
root
|-- identifier: struct (nullable = true)
| |-- domain: string (nullable = true)
| |-- id: string (nullable = true)
| |-- version: long (nullable = true)
alternativeIdentifiers: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- identifier: struct (nullable = true)
| | | | |-- domain: string (nullable = true)
| | | | |-- id: string (nullable = true)
-- Output I'm expecting
SELECT identifier.domain, identifier.id, identifier.version