Usage-Batch
I have a PySpark Dataframe consisting of 750+ Columns and 2.5M records which is roughly around 6.5 GB. I am doing bulk insert (batch) from Databricks python notebook to Azure Synapse table.
Below is the sample code as per the Microsoft documentation (https://docs.databricks.com/data/data-sources/azure/synapse-analytics.html)
spark.conf.set(
"fs.azure.account.key.<your-storage-account-name>.blob.core.windows.net",
"<your-storage-account-access-key>")
df.write \
.format("com.databricks.spark.sqldw") \
.option("url", "jdbc:sqlserver://<the-rest-of-the-connection-string>") \
.option("forwardSparkAzureStorageCredentials", "true") \
.option("dbTable", "my_table_in_dw_copy") \
.option("tempDir", "wasbs://<your-container-name>@<your-storage-account-name>.blob.core.windows.net/<your-directory-name>") \
.save()
Problem Statement: I need to implement the same for one more PySpark Dataframe which consists one column which contains more than 8000 characters as JSON. In the underlying Synapse table this column is of nvarhcar(max) type. The above mentioned code doesn't work for columns having character length more than 4000 characters.
Please help how to deal with this issue in the above code for this situation.