0

I have sql output I am creating from the parquet file, I want to convert this sql df into the below mentioned format (structType/structField) using pyspark (not scala)

df=self.spark.read.parquet("test.parquet") df.createOrReplaceTempView("vw_test")

sql = f""" SELECT id, name, test_id FROM test """

|Id       |  Name       |test_id|
|   67            |APPLE INC|1027          |
|   67            |APPLE INC|1028          |
|   67            |APPLE INC|1029          |
|  268            |KETO     |1127          |
|  269            |DAVE     |1227          |
+-----------------+-----------------------------+

I want to convert this output into - "test": [ { "id": "67", "name": "APPLE INC", "test_id":"1027" }, { "id": "67", "name": "APPLE INC", "test_id":"1028" }, { "id": "67", "name": "APPLE INC", "test_id":"1029" }, { "id": "268", "name": "KETO INC", "test_id":"1127" }, { "id": "269", "name": "DAVE INC", "test_id":"1227" } ]

Basically the SQL will follow below struct Type and struct Field - schema = StructType([ StructField( "test_info", StructType( [ StructField("id", StringType(), True), StructField("name", StringType(), True), StructField("test_id", StringType(), True), ]), ) ])

Pooja
  • 165
  • 4
  • 14

1 Answers1

0

You can use the pyspark to_json function to take your dataframe data and convert into a JSON doc, https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.to_json.html

Nick
  • 2,524
  • 17
  • 25
  • my input data in not in that format where i can easily convert into json – Pooja Oct 14 '22 at 15:45
  • @Pooja Could you elaborate? Your question seemed to indicate you already had it in a dataframe, is that accurate? – Nick Oct 14 '22 at 21:02