Problem
Describe the issue. Provide a code snippet or screenshot of the error if relevant. Optionally, provide a brief overview of why this problem occurs. Link to the Cloud Data Fusion release notes if applicable.
For example,
Your pipeline fails with a JSchException
that contains a java.net.ConnectException: Connection timed out
error or an Auth fail
error.
Your pipeline is likely failing because Cloud Data Fusion is unable to SSH to the Cloud Dataproc cluster’s master node.
Solution(s)
Describe possible solutions for the issue. List steps for the solution(s) and/or link to other docs that have the steps needed.
There may be more than one solution to a problem. If so:
Divide this Solutions section into subsections.
Provide a brief overview of the possible solutions before diving into the subsections.
Make the most common/probable solution the first subsection.
Write your headings as actions that summarize the solution.
For example,
If the error message you’re seeing contains java.net.ConnectException: Connection timed out
, it’s likely that your project is missing a necessary firewall rule. If the error message contains Auth fail
, you might be running a Cloud Data Fusion instance that is missing a fix to a known issue.
Create a missing firewall rule
If the error message you’re seeing contains the message java.net.ConnectException: Connection timed out
, it’s likely that your project is missing a necessary firewall rule.
New projects start with a default network that is pre-populated with the firewall rule, default-allow-ssh
. This firewall rule allows incoming connections on port 22 from any source to any Cloud Data Fusion instance in the network.
Follow these steps to create the missing firewall rule. Then, rerun your pipeline.
Create a new Cloud Data Fusion instance
If the error message you’re seeing contains Auth fail
, it’s likely because your Cloud Data Fusion instance contains a known issue that was resolved on May 23, 2019. You probably created your instance before this issue was fixed.
Follow these steps to create a new Data Fusion instance. Then, rerun your pipeline in the new instance.
...
An increased driver memory is needed for pipelines with large number of nodes (20+). If the driver memory is not set high then it results in a driver crash and leads to the following error “Malformed reply from SOCKS server”
Solution(s)
Set driver resources to 8 GB in “Configure”
...
Related articles
Page Properties | ||
---|---|---|
| ||
|