I have a glue column whose datatype in
Glue is
struct<quantity:bigint,unit:bigint>
However when spark infers this schema, it converts this glue type to spark metadata and saves it to Glue table properties as follows:
"name": "columnName",
"type": {
"type": "struct",
"fields": [
{
"name": "quantity",
"type": "long",
"nullable": true,
"metadata": {}
},
{
"name": "unit",
"type": "long",
"nullable": true,
"metadata": {}
}
]
},
"nullable": true,
"metadata": {} }
Is there a library or any inbuilt function that glue or spark has that can help me with the conversion of glue column type to Spark metadata in Java..? I have to convert those glue datatypes to Spark metadata
The in coming glue columns datatype can also be a nested structure of maps arrays and structs as well
Another example of Glue Datatype:
struct<column1:string,averageHeight:double employeeName:string,firstName:string,secondName:string,listOfBooks:bigint,price:bigint,studentId:bigint,offerPrice:struct<quantityOfBooks:bigint,class:bigint>,bookStore:string,reviewCount:bigint,author:string,title:string,studentRollNo:string>
Spark conversion:
{
"name": "studentData",
"type": {
"type": "struct",
"fields": [
{
"name": "column1",
"type": "string",
"nullable": true,
"metadata": {}
},
{
"name":"averageHeight",
"type": "double",
"nullable": true,
"metadata": {}
},
{
"name": "employeeName",
"type": "string",
"nullable": true,
"metadata": {}
},
{
"name": "firstName",
"type": "string",
"nullable": true,
"metadata": {}
},
{
"name": "secondName",
"type": "string",
"nullable": true,
"metadata": {}
},
{
"name": "listofBooks",
"type": "long",
"nullable": true,
"metadata": {}
},
{
"name": "price",
"type": "long",
"nullable": true,
"metadata": {}
},
{
"name": "studentId",
"type": "long",
"nullable": true,
"metadata": {}
},
{
"name": "offerPrice",
"type": {
"type": "struct",
"fields": [
{
"name": "quantityOfBooks",
"type": "long",
"nullable": true,
"metadata": {}
},
{
"name": "class",
"type": "long",
"nullable": true,
"metadata": {}
}
]
},
"nullable": true,
"metadata": {}
},
{
"name": "bookStore",
"type": "string",
"nullable": true,
"metadata": {}
},
{
"name": "reviewCount",
"type": "long",
"nullable": true,
"metadata": {}
},
{
"name": "author",
"type": "string",
"nullable": true,
"metadata": {}
},
{
"name": "title",
"type": "string",
"nullable": true,
"metadata": {}
},
{
"name": "studentRollNo",
"type": "string",
"nullable": true,
"metadata": {}
}
]
},
"nullable": true,
"metadata": {}
}
Note need to do this in Java. I'm aware of Dataframes in spark and converting them to df.prettyJson to get the spark metadata conversion of the glue type. However I need to do this conversion via Java code. What is the best possible approach for this conversion ..?