JSchException during pipeline execution
Problem
You get a JSchException
caused by a java.net.ConnectException: Connection timed out
error or an Auth fail
error. In these cases, your pipeline doesn’t run because Cloud Data Fusion is unable to SSH to the Cloud Dataproc cluster’s master node.
Symptom
Pipelines are configured by default to run on a remote Cloud Dataproc cluster. When you run your pipeline, Cloud Data Fusion runs the pipeline on a Cloud Dataproc cluster by SSHing to the cluster’s master node and launching a Hadoop job from the node. If Cloud Data Fusion is unable to SSH to the master node due to lack of network connectivity or authentication failure, the pipeline run will fail and a JSchException
will appear in the pipeline logs.
There are two common cases in which you might get a JSchException
:
java.net.ConnectException: Connection timed out
error:
java.io.IOException: com.jcraft.jsch.JSchException: java.net.ConnectException: Connection timed out (Connection timed out)
at io.cdap.cdap.common.ssh.DefaultSSHSession.<init>(DefaultSSHSession.java:82) ~[na:na]
at io.cdap.cdap.internal.app.runtime.distributed.remote.RemoteExecutionTwillPreparer.lambda$start$0(RemoteExecutionTwillPreparer.java:429) ~[na:na]
at io.cdap.cdap.internal.app.runtime.distributed.remote.RemoteExecutionTwillRunnerService$ControllerFactory.lambda$create$0(RemoteExecutionTwillRunnerService.java:519) ~[na:na]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_212]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_212]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_212]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_212]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_212]
Auth fail
error:
java.io.IOException: com.jcraft.jsch.JSchException: Auth fail
at io.cdap.cdap.common.ssh.DefaultSSHSession.<init>(DefaultSSHSession.java:82) ~[na:na]
at io.cdap.cdap.internal.app.runtime.distributed.remote.RemoteExecutionTwillPreparer.lambda$start$0(RemoteExecutionTwillPreparer.java:429) ~[na:na]
at io.cdap.cdap.internal.app.runtime.distributed.remote.RemoteExecutionTwillRunnerService$ControllerFactory.lambda$create$0(RemoteExecutionTwillRunnerService.java:519) ~[na:na]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_212]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_212]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_212]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_212]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_212]
Caused by: com.jcraft.jsch.JSchException: Auth fail
at com.jcraft.jsch.Session.connect(Session.java:519) ~[com.jcraft.jsch-0.1.54.jar:na]
at com.jcraft.jsch.Session.connect(Session.java:183) ~[com.jcraft.jsch-0.1.54.jar:na]
at io.cdap.cdap.common.ssh.DefaultSSHSession.<init>(DefaultSSHSession.java:79) ~[na:na]
... 7 common frames omitted
Â
Solution
If the error message you get contains java.net.ConnectException: Connection timed out
, this is likely because the firewall rules in your project don't allow ingress connections on port 22. New projects start with a default network that is pre-populated with a firewall rule, default-allow-ssh
. This firewall rule allows ingress connections on port 22 from any source to any instance in the network. If such a firewall rule doesn't exist in the network used by your Cloud Data Fusion instance, create such a rule. Then rerun your pipeline.
If the error message you get contains Auth fail
, this is likely because of a known issue that was resolved on May 23, 2019. If you're getting this error, the Cloud Data Fusion instance you're running might have been created before this time and therefore doesn't have the fix for this bug. Create a new instance.
Â
Â
Â