1

Usage-Batch

I have a PySpark Dataframe consisting of 750+ Columns and 2.5M records which is roughly around 6.5 GB. I am doing bulk insert (batch) from Databricks python notebook to Azure Synapse table.

Below is the sample code as per the Microsoft documentation (https://docs.databricks.com/data/data-sources/azure/synapse-analytics.html)

spark.conf.set(
  "fs.azure.account.key.<your-storage-account-name>.blob.core.windows.net",
  "<your-storage-account-access-key>")

df.write \
  .format("com.databricks.spark.sqldw") \
  .option("url", "jdbc:sqlserver://<the-rest-of-the-connection-string>") \
  .option("forwardSparkAzureStorageCredentials", "true") \
  .option("dbTable", "my_table_in_dw_copy") \
  .option("tempDir", "wasbs://<your-container-name>@<your-storage-account-name>.blob.core.windows.net/<your-directory-name>") \
  .save()

Problem Statement: I need to implement the same for one more PySpark Dataframe which consists one column which contains more than 8000 characters as JSON. In the underlying Synapse table this column is of nvarhcar(max) type. The above mentioned code doesn't work for columns having character length more than 4000 characters.

Please help how to deal with this issue in the above code for this situation.

Varun05
  • 389
  • 3
  • 13
  • Hi @varun05, I think this is covered by my previous answer: https://stackoverflow.com/a/63815914/5139495 on this question: https://stackoverflow.com/questions/60521948/azure-databricks-to-azure-sql-dw-long-text-columns – jabberwocky Nov 02 '20 at 17:43
  • Does this answer your question? [Azure Databricks to Azure SQL DW: Long text columns](https://stackoverflow.com/questions/60521948/azure-databricks-to-azure-sql-dw-long-text-columns) – jabberwocky Nov 02 '20 at 17:45
  • You cannot set more than 8000 characters for maxStrLength – Varun05 Dec 03 '20 at 04:30

0 Answers0