I am using an Azure Synapse notebook in pyspark to write contents from an api call into a dataframe which is an array and later write it to a database.
Code:
from pyspark.sql.functions import explode
from pyspark.sql import SparkSession
from pyspark.sql import DataFrame
DataFrame.select(explode("Data").alias("d")).select("d.Date","d.Unit","d.Code")
JSON:
{"Data":[{"Date":"2021-10-01T00:00:00.000-08:00","Unit":"HR","Code":"0012"},{"Date":"1900-10-02T00:00:00.000-05:00","Unit":"HR","Code":"0021"}]}
Schema:
root
|-- Data: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- Date: string (nullable = true)
| | |-- Unit: string (nullable = true)
| | |-- Code: string (nullable = true)
Dataframe:
DataFrame[Code: array<struct<Date:string,Unit:string,Code:string>>]
{"Data":[{"Date":"2021-10-01T00:00:00.000-08:00","Unit":"HR","Code":"0012"},{"Date":"1900-10-02T00:00:00.000-05:00","Unit":"HR","Code":"0021"}]}
Error:
IllegalArgumentException: Can't get JDBC type for array<struct<Date:string,Unit:string,Code:string>>
Traceback (most recent call last):
File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 737, in save self._jwrite.save()
File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__ answer, self.gateway_client, self.target_id, self.name)
File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 85, in deco raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
Write from Dataframe to database:
server_name = "jdbc:sqlserver://ServerName"
database_name = "database_name"
url = server_name + ";" + "databaseName=" + database_name + ";"
table_name = "table_name"
username = "username"
password = "password"
df.write \
.format("com.microsoft.sqlserver.jdbc.spark") \
.mode("overwrite") \
.option("url", url) \
.option("dbtable", table_name) \
.option("user", username) \
.option("password", password) \
.save()
Could you please guide me on how to explode the contents of the api json payload and write the data to a database table.