This article describes how to connect a private Data Fusion instance to a private CloudSQL instance hosting a mysql database.
Background
Public CloudSQL instances can be accessed from Cloud Data Fusion without any additional setup as we use the socket factory driver and with appropriate connection string. Steps for connecting to a public CloudSQL instance can be found here.
For connecting a private CloudSQL instance we need a proxy GCE VM that helps us to connect Cloud Data Fusion to CloudSQL.
Instructions
Create private Cloud Data Fusion instance as per instructions here. This involves
Setting up VPC network
Allocate IP range
Create private Cloud SQL instance in the same VPC network as Cloud Data Fusion instance.
Once the instance is created, go to the CloudSQL instance details page to the “Connect to this instance” section and copy the IP and the connection name.
Now that both the instances are setup create a private GCE VM (with only internal IP) using the following gcloud command,
export PROJECT=<customer-project> export REGION=<vm-region> export ZONE=`gcloud compute zones list --filter="name=${REGION}" --limit 1 --uri --project=${PROJECT}| sed 's/.*\///'` export SUBNET=<customer-vpc-subnet-name> export NAME=<gce-vm-name> export MYSQL_CONN=<mysql-instance-connection-name> # Create the VM gcloud beta compute --project=${PROJECT} instances create ${NAME} --zone=${ZONE} --machine-type=g1-small --subnet=${SUBNET} --no-address --metadata=startup-script="docker run -d -p 0.0.0.0:3306:3306 gcr.io/cloudsql-docker/gce-proxy:1.16 /cloud_sql_proxy -instances=${MYSQL_CONN}=tcp:0.0.0.0:3306" --maintenance-policy=MIGRATE --scopes=https://www.googleapis.com/auth/cloud-platform --image=cos-69-10895-385-0 --image-project=cos-cloud # Get the VM internal IP export IP=`gcloud compute instances describe ${NAME} --zone ${ZONE} | grep "networkIP" | awk '{print $2}'` # Promote the VM internal IP to static IP gcloud compute addresses create mysql-proxy --addresses ${IP} --region ${REGION} --subnet ${SUBNET} # Note down the IP to be used in CDF MySQL JDBC connection string echo ${IP}
Once the VM is created with a static IP now use this as the host for the mysql database to be accessed from Cloud Data Fusion.
Accessing MySQL from within Data Fusion
Before you begin accessing the mysql instance from Cloud Data Fusion instance, make sure you have installed the mysql driver from Hub (or directly from mysql portal).
An example of accessing the mysql database from Wrangler can look like the following screenshot,
The same can be accessed directly from pipeline studio, using Database source plugin with similar properties (JDBC connection string, username and password & plugin type) like the following screenshot,
A similar setup should work for both Postgres and SQLServer instances as well.