Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This article will outline the steps required to connect to sources and sinks that reside outside of the customer project. The workflow within Data Fusion is the same, you . You will simply need to provide access to a couple of service accounts so CDF can access them. Note that Data Fusion by default has access to read and write to Big Query/GCS/Pub-Sub/Spanner/BigTable on the project where the Data Fusion instance is created, so these steps are not needed for sources and sinks in the customer project.

Service Account Description

There is some confusion regarding the various service accounts Data Fusion creates/uses to operate. This table provides a breakdown for each service account and its function:

Service account name format

Description

Uses in CDF

service-<project_number>@gcp-sa-datafusion.iam.gserviceaccount.com

Service account used by CDF in the tenant project to access resources in the customer project.

This account is used to:

  • Access resources during Preview

  • Access resources from Wrangler

  • Create Dataproc cluster in customer project

<project_number>-compute@developer.gserviceaccount.com

Default service account used by the Dataproc VMs.

This account is used to:

  • Access resources during a pipeline run for a deployed pipeline

...

  1. Navigate to the customer project that contains the CDF instance and copy the project number (this is found on the Home Page in the Project Info card).

  2. Navigate to the project that contains the resources you would like to interact with.

  3. In the sidebar, click on IAM & Admin.

  4. Click on Addat the top of the page.

  5. Provide the first service account name from the table above, be . Be sure to replace <project_number> with the actual number you obtained in step 1.

  6. Grant the Admin role for the resource you would like to interact with. Ex. BigQuery Admin for reading/writing to BigQuery. For BigQuery, you will also need to grant the BigQuery Data Owner role as well.

  7. Repeat steps 5 & 6 for the second service account in the table above.

  8. In your data pipeline, ensure you define the correct Project Id for the sources/sinks. Using ‘auto-detect’ will default to the customer project that contains the CDF instance.

...