Table gets deleted when trying to overwrite the data in it from databricks spark

Question

I am trying to write dataframe data into a table in Azure SQL from Databricks using pyspark. Table : dbo.test already exists in the database. I am able to read it before I execute below write operation.

testDf.write.format("com.microsoft.sqlserver.jdbc.spark").mode("overwrite")\
        .option("truncate", "false")\
        .option("url", azure_sql_url).option("dbtable", 'dbo.test')\
        .option("databaseName", database_name)\
        .option("user", username) \
        .option("password", password) \
        .option("encrypt", "true")\
        .option("hostNameInCertificate", "*.database.windows.net")\
        .option("bulkCopyBatchSize", 10000).option("bulkCopyTableLock", "true")\
        .option("bulkCopyTimeout", "6000000")\
        .save()

After executing this command the following error is returned:

java.lang.NoSuchMethodError: org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.schemaString(Lorg/apache/spark/sql/Dataset;Ljava/lang/String;Lscala/Option;)Ljava/lang/String;

Surprisingly, the dbo.test table gets deleted.

Can someone help me understand why this is happening. Same code works fine in another environment.

Pratik Lad · Accepted Answer · 2023-06-16T09:19:10.530

java.lang.NoSuchMethodError: org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.schemaString(Lorg/apache/spark/sql/Dataset;Ljava/lang/String;Lscala/Option;)Ljava/lang/String;

The cause of error is version mismatch between the Spark SQL and JDBC spark connector.Also, The mode("overwrite") drops the table if already exists by default and re-creates a new one.

Here are some supported versions of JDBC spark connector ith respective spark version:

Spark 2.4.x : com.microsoft.azure:spark-mssql-connector:1.0.2
Spark 3.0.x : com.microsoft.azure:spark-mssql-connector_2.12:1.1.0
Spark 3.1.x : com.microsoft.azure:spark-mssql-connector_2.12 :1.2.0

Resolution:

If you are using Databricks runtime version greater than 10 then you need to downgrade it to Databricks runtime version 9.1 LTS or down. and use respective connector.

I downgraded Databricks runtime version to 7.3 LTS:
Then installed appropriate library for spark 3.0.x which is com.microsoft.azure:spark-mssql-connector_2.12:1.1.0
And tried your code its working fine.

df_name.write.format("com.microsoft.sqlserver.jdbc.spark").mode("overwrite")\
        .option("truncate", "false")\
        .option("url", "Azure_sql_url").option("dbtable", 'dbo.test')\
        .option("databaseName", "databasename")\
        .option("user", "username") \
        .option("password", "password") \
        .option("encrypt", "true")\
        .option("hostNameInCertificate", "*.database.windows.net")\
        .option("bulkCopyBatchSize", 10000).option("bulkCopyTableLock", "true")\
        .option("bulkCopyTimeout", "6000000")\
        .save()

My execution:

enter image description here

OUTPUT:

enter image description here

Or you can directly use dataframename.format("jdbc")

Sample CODE:

df_name.write.format("jdbc").mode("overwrite")\
        .option("truncate", "false")\
        .option("url", "Azure_sql_url").option("dbtable", 'dbo.test')\
        .option("databaseName", "databasename")\
        .option("user", "username") \
        .option("password", "password") \
        .option("encrypt", "true")\
        .option("hostNameInCertificate", "*.database.windows.net")\
        .option("bulkCopyBatchSize", 10000).option("bulkCopyTableLock", "true")\
        .option("bulkCopyTimeout", "6000000")\
        .save()

You can also refer similar issue here on Github or this similar SO thread

Reference: https://learn.microsoft.com/en-us/sql/connect/spark/connector?view=sql-server-ver16

Thanks for the details @Pratik, I did downgraded the Databricks runtime version from 10.4 to 9.1 LTS with spark 3.1.2 and Scala 2.12. I have uninstalled old connectors and installed com.microsoft.azure:spark-mssql-connector_2.12:1.2.0 and I still face the same issue. I am unable to create a new table — Ramaraju.d, Jun 16 '23 at 08:30
Hey thanks @Pratik. It worked. After uninstall of previous library i haven't restarted my cluster. This now works. Thanks a lot :-) — Ramaraju.d, Jun 19 '23 at 10:14

Table gets deleted when trying to overwrite the data in it from databricks spark

1 Answers1