This article will outline the steps required connect to sources and sinks that reside outside of the customer project. The workflow within Data Fusion is the same, you will simply need to provide access to a couple of service accounts so CDF can access them.
Service Account Description
There is some confusion regarding the various service accounts Data Fusion creates/uses to operate. This table provides a breakdown for each service account and its function
Service account name format | Description | Uses in CDF |
---|---|---|
service-<project_number>@gcp-sa-datafusion.iam.gserviceaccount.com | Service account used CDF in the tenant project to access resources in the customer project. | This account is used to:
|
<project_number>-compute@developer.gserviceaccount.com | Default service account used by the Dataproc VMs. | This account is used to:
|
Steps to connect external resources
Navigate to the customer project that contains the CDF instance and copy the project number (this is found on the Home Page in the Project Info card)
Navigate to the project that contains the resources you would like to interact with.
In the sidebar, click on ‘IAM & Admin’
Click on ‘Add’ at the top of the page.
Provide the first service account name from the table above, be sure to replace <project_number> with the actual number you obtained in step 1
Grant the
Admin
role for the resource you would like to interact with. Ex.BigQuery Admin
for reading/writing to BigQuery.Repeat steps 5 & 6 for the second service account in the table above.
In your pipeline, ensure you define the correct Project Id for the sources/sinks. Using ‘auto-detect’ will default to the customer project that contains the CDF instance.