Bulk insert PySpark Dataframe in Azure Synapse from Databricks Python Notebook

Question

Usage-Batch

I have a PySpark Dataframe consisting of 750+ Columns and 2.5M records which is roughly around 6.5 GB. I am doing bulk insert (batch) from Databricks python notebook to Azure Synapse table.

Below is the sample code as per the Microsoft documentation (https://docs.databricks.com/data/data-sources/azure/synapse-analytics.html)

spark.conf.set(
  "fs.azure.account.key.<your-storage-account-name>.blob.core.windows.net",
  "<your-storage-account-access-key>")

df.write \
  .format("com.databricks.spark.sqldw") \
  .option("url", "jdbc:sqlserver://<the-rest-of-the-connection-string>") \
  .option("forwardSparkAzureStorageCredentials", "true") \
  .option("dbTable", "my_table_in_dw_copy") \
  .option("tempDir", "wasbs://<your-container-name>@<your-storage-account-name>.blob.core.windows.net/<your-directory-name>") \
  .save()

Problem Statement: I need to implement the same for one more PySpark Dataframe which consists one column which contains more than 8000 characters as JSON. In the underlying Synapse table this column is of nvarhcar(max) type. The above mentioned code doesn't work for columns having character length more than 4000 characters.

Please help how to deal with this issue in the above code for this situation.

Hi @varun05, I think this is covered by my previous answer: https://stackoverflow.com/a/63815914/5139495 on this question: https://stackoverflow.com/questions/60521948/azure-databricks-to-azure-sql-dw-long-text-columns — jabberwocky, Nov 02 '20 at 17:43
Does this answer your question? [Azure Databricks to Azure SQL DW: Long text columns](https://stackoverflow.com/questions/60521948/azure-databricks-to-azure-sql-dw-long-text-columns) — jabberwocky, Nov 02 '20 at 17:45

Bulk insert PySpark Dataframe in Azure Synapse from Databricks Python Notebook

0 Answers0