I am stuck with a problem with Java security setting preventing my Dataproc cluster (Image 2.0.32-debian10) running PySpark to connect to SQL Server 2019 with Spark/JDBC connector (spark:spark.jars.packages=com.microsoft.azure:spark-mssql-connector_2.12:1.2.0 and sqljdbc4-2.0.jar). It seems the security level of the cluster and SQL Server 2019 (running on a GCE) do not match and there is this SSL/TLS error during handshake.
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: The driver could not establish a secure connection to SQL Server by using Secure Sockets Layer (SSL) encryption. Error: "No appropriate protocol (protocol is disabled or cipher suites are inappropriate)". at com.microsoft.sqlserver.jdbc.SQLServerConnection.terminate(SQLServerConnection.java:1368) at com.microsoft.sqlserver.jdbc.TDSChannel.enableSSL(IOBuffer.java:1412) at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:1058) at com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:833) at com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:716) at com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:841)
I could bypass this by using Jupyter Notebook to directly modify "Local Disk/usr/lib/jvm/temurin-8-jdk-amd64/jre/lib/security/java.security" and comment out this part of the code "jdk.tls.disabledAlgorithms=SSLv3, TLSv1, TLSv1.1, RC4, DES, MD5withRSA,
DH keySize lessThan 1024, EC keySize lessThan 224, 3DES_EDE_CBC, anon, NULL,
include jdk.disabled.namedCurves", and I will be able to connect to SQL Server successfully.
However this would work only if my Dataproc cluster is a single node cluster. If it's a standard cluster (e.g. 1 master+2 workers), it seems the node doesn't recognize this change in security properties and I faced the same error as before (perhaps because the job is distributed to a worker node which does not recognize the change in java.security in the master node local disk).
I tried passing these as the cluster properties hoping it would apply the change in security properties to worker nodes as well:
spark:spark.driver.extraJavaOptions='-Djava.security.properties==gs://greenline-demo-341617-spark-src/Demo/02_Microsoft_SQL/java.security',spark:spark.executor.extraJavaOptions='-Djava.security.properties==gs://greenline-demo-341617-spark-src/Demo/02_Microsoft_SQL/java.security'
But I'm stuck at this error instead
Exception in thread "main" java.lang.InternalError: internal error: SHA-1 not available. at sun.security.provider.SecureRandom.init(SecureRandom.java:108) at sun.security.provider.SecureRandom.(SecureRandom.java:79) at java.security.SecureRandom.getDefaultPRNG(SecureRandom.java:198) at java.security.SecureRandom.(SecureRandom.java:162) at java.util.UUID$Holder.(UUID.java:96) at java.util.UUID.randomUUID(UUID.java:142)
Connection string url_db = "jdbc:sqlserver://10.148.xx.xx:1433;databaseName=AdventureWorks2019;sslProtocol=TLSv1;encrypt=false"