3

I got stuck into an issue which already has wasted 3 days of mine. I have a dataproc cluster 1.5 and i also did setup SQL Server on google VM running centos 7 OS. But i am unable to connect SQL Server through pyspark from dataproc cluster. You can find the error snapshot in the attachment. SSL encryption is disabled on SQL server. I can access SQL server through sqlcmd(installed on dataproc cluster) and also through PYMSSQL library from dataproc cluster. But not with pyspark. The same error occurs while trying to access MSSQL from Sqoop as well. Kindly, guide me i have tried all possible solution available on internet but still no luck for me. Thanks in advance. My Connection String is:

df = spark.read.format("jdbc") \
.option("url", "jdbc:sqlserver://x.x.x.x:1433;encrypt=false;databaseName=gcp") \
.option("dbtable", "xxx") \
.option("user", "xxx") \
.option("password", "xxxx") \
.option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver").load()

Connection Error Snapshot

  • Are you running really old jdbc drivers? Have you seen [SQL Server JDBC Error on Java 8: The driver could not establish a secure connection to SQL Server by using Secure Sockets Layer (SSL) encryption](https://stackoverflow.com/questions/32766114/sql-server-jdbc-error-on-java-8-the-driver-could-not-establish-a-secure-connect)? – AlwaysLearning Feb 18 '21 at 09:23
  • How do you know SSL is disabled, does SQL Server have Force Encryption set to On? – Charlieface Feb 18 '21 at 09:47
  • 1
    This could happen because Dataproc uses Conscrypt by default, try to disable it via [cluster properties](https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/cluster-properties): `dataproc:dataproc.conscrypt.provider.enable=false` – Igor Dvorzhak Feb 18 '21 at 20:15
  • thanks @AlwaysLearning i have tried almost every driver from older to newest but nothing has worked because problem lies in dataproc cluster because of the property mentioned by "Igor Dvorzhak" , we just need to disable this property to get a successful connection to MSSQL Server. – Muhammad Waqas Jamil Feb 22 '21 at 07:34
  • thanks @Charlieface, command: "SELECT session_id, encrypt_option FROM sys.dm_exec_connections". – Muhammad Waqas Jamil Feb 22 '21 at 07:41

1 Answers1

3

This could happen because Dataproc uses Conscrypt by default to improve performance.

Depending on MS SQL JDBC deriver version that you use it can have bugs that lead to failures when Conscrypt is used.

To workaround this issue try to disable Conscrypt during Dataproc cluster creation via cluster properties:

gcloud dataproc clusters create $CLUSTER_NAME \
  --properties=dataproc:dataproc.conscrypt.provider.enable=false
Igor Dvorzhak
  • 4,360
  • 3
  • 17
  • 31
  • 1
    thank you so much @Igor Dvorzhak, Yes, this is the right answer we need to disable this property in order to access MSSQL server db. Thanks again. – Muhammad Waqas Jamil Feb 22 '21 at 07:32