I have sql output I am creating from the parquet file, I want to convert this sql df into the below mentioned format (structType/structField) using pyspark (not scala)
df=self.spark.read.parquet("test.parquet") df.createOrReplaceTempView("vw_test")
sql = f""" SELECT id, name, test_id FROM test """
I want to convert this output into - "test": [ { "id": "67", "name": "APPLE INC", "test_id":"1027" }, { "id": "67", "name": "APPLE INC", "test_id":"1028" }, { "id": "67", "name": "APPLE INC", "test_id":"1029" }, { "id": "268", "name": "KETO INC", "test_id":"1127" }, { "id": "269", "name": "DAVE INC", "test_id":"1227" } ]
Basically the SQL will follow below struct Type and struct Field - schema = StructType([ StructField( "test_info", StructType( [ StructField("id", StringType(), True), StructField("name", StringType(), True), StructField("test_id", StringType(), True), ]), ) ])