Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Note

See the public version of this document. If you update this page, please also update the public page.

This article explains how to run pipelines against existing Dataproc clusters step-by-step. This feature is available only on the Enterprise edition of Cloud Data Fusion ("Execution environment selection").

...

  1. SSH Setup on Dataproc Cluster.

    1. Navigate to the Dataproc console on Google Cloud Platform. Go to Cluster details by clicking on your Dataproc cluster name.

    2. Under VM Instances, click on the SSH button to connect to the Master Dataproc VM.

    3. To create a new SSH key, use command:

      1. ssh-keygen -m PEM -t rsa -b 4096 -f ~/.ssh/[KEY_FILENAME] -C [USERNAME]

      2. Remember to leave the passphrase empty i.e. when prompted for one, hit enter.

    4. This will create 2 key files

      1. ~/.ssh/[KEY_FILENAME] (Private Key)

      2. ~/.ssh/[KEY_FILENAME].pub (Public Key)

    5. To view these in an easy copiable format, use commands:

      1. cat [KEY_FILENAME].pub

      2. cat [KEY_FILENAME]

    6. Navigate to the GCE VM instance detail page. Click Metadata > SSH Keys. Edit and add the full public key from the copy in step [1.e.i]. Make sure to delete all Newlines that may be pasted over.

...