2

I created a simple pipeline in Data Fusion, which reads from a single MS SQL Server table and writes to BigQuery. It fails when connecting to the SQL Server with a socket handshake error.

I've seen this issue when creating my own dataproc clusters, and I know it has to do with dataproc using conscrypt as the default when doing ssl. I also found a workaround, which is setting a property when creating the cluster: dataproc:dataproc.conscrypt.provider.enable=false

This is not possible when running Data Fusion as I have no control over how the cluster is created. I've tried adding the property to the engine config section, but it doesn't work and the property doesn't show up in the dataproc cluster configuration page.

This is the stacktrace in Data Fusion:

java.net.SocketException: Socket is closed
    at org.conscrypt.NativeSsl.doHandshake(NativeSsl.java:390) ~[libconscrypt.jar:1.2.0-SNAPSHOT]
    at org.conscrypt.ConscryptFileDescriptorSocket.startHandshake(ConscryptFileDescriptorSocket.java:225) ~[libconscrypt.jar:1.2.0-SNAPSHOT]
    at com.microsoft.sqlserver.jdbc.TDSChannel.enableSSL(IOBuffer.java:1688) ~[na:na]
    at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:1977) ~[na:na]
    at com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:1628) ~[na:na]
    at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectInternal(SQLServerConnection.java:1459) ~[na:na]
    at com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:773) ~[na:na]
    at com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:1168) ~[na:na]
    at io.cdap.plugin.db.JDBCDriverShim.connect(JDBCDriverShim.java:60) ~[na:na]
    at java.sql.DriverManager.getConnection(DriverManager.java:664) ~[na:1.8.0_212]
    at java.sql.DriverManager.getConnection(DriverManager.java:208) ~[na:1.8.0_212]

I just want to read data from SQL Server in Data Fusion.

Bjoern
  • 433
  • 3
  • 16

1 Answers1

5

This happens because the Dataproc by default uses Conscrypt SSL provider that has a bug when creating SSL Context using Conscrypt SSL Provider.

Solution To fix the issue while running the pipeline disable using conscrypt while creating Dataproc cluster. This can be done by setting the following runtime argument for the pipeline.

system.profile.properties.dataproc:dataproc.conscrypt.provider.enable false

The following screenshot shows how to set this for a pipeline using the UI

enter image description here

Sree
  • 714
  • 4
  • 8