Problem
Pipelines created to read from Microsoft SQL Server or writing to Microsoft SQL Server fail with the following error
Error: "Socket is closed". ClientConnectionId:<ID>
Symptom
Pipelines that are configured to run on Dataproc, either reading from SQL Server or writing to SQL Server fail while running the pipeline with a Socket is closed exception. The complete error message is as follows:
ava.lang.RuntimeException: java.lang.RuntimeException: com.microsoft.sqlserver.jdbc.SQLServerException: The driver could not establish a secure connection to SQL Server by using Secure Sockets Layer (SSL) encryption. Error: "Socket is closed". ClientConnectionId:<ID> at org.apache.hadoop.mapreduce.lib.db.DBInputFormat.setConf(DBInputFormat.java:171) ~[hadoop-mapreduce-client-core-2.8.5.jar:na]
This happens because the Dataproc by default uses Conscrypt SSL provider that has a bug when creating SSL Context using Conscrypt SSL Provider.
Solution
To fix the issue while running the pipeline disable using conscrypt while creating Dataproc cluster. This can be done by setting the following runtime argument for the pipeline.
system.profile.properties.dataproc:dataproc.conscrypt.provider.enable false
The following screenshot shows how to set this for a pipeline using the UI