Connect to private cloudsql-mysql instance from private Cloud Data Fusion instance
This article describes how to connect a private Data Fusion instance to a private CloudSQL instance hosting a mysql database.
Background
Public CloudSQL instances can be accessed from Cloud Data Fusion without any additional setup as we use the socket factory driver and with appropriate connection string. Steps for connecting to a public CloudSQL instance can be found here.
For connecting a private CloudSQL instance we need a proxy GCE VM that helps us to connect Cloud Data Fusion to CloudSQL.
Instructions
Create a private Cloud Data Fusion instance as per instructions here. This involves
Setting up VPC network
Allocating IP range
Create a private Cloud SQL instance in the same VPC network as Cloud Data Fusion instance.
Once the instance is created, go to the CloudSQL instance details page to the “Connect to this instance” section and copy the IP and the connection name.
Now that both the instances are set up, create a private GCE VM (with only internal IP) using the following gcloud command:
export PROJECT=<customer-project>
export REGION=<vm-region>
export ZONE=`gcloud compute zones list --filter="name=${REGION}" --limit 1 --uri --project=${PROJECT}| sed 's/.*\///'`
export SUBNET=<customer-vpc-subnet-name>
export NAME=<gce-vm-name>
export MYSQL_CONN=<mysql-instance-connection-name>
# Create the VM
gcloud beta compute --project=${PROJECT} instances create ${NAME} --zone=${ZONE} --machine-type=g1-small --subnet=${SUBNET} --no-address --metadata=startup-script="docker run -d -p 0.0.0.0:3306:3306 gcr.io/cloudsql-docker/gce-proxy:1.16 /cloud_sql_proxy -instances=${MYSQL_CONN}=tcp:0.0.0.0:3306" --maintenance-policy=MIGRATE --scopes=https://www.googleapis.com/auth/cloud-platform --image=cos-69-10895-385-0 --image-project=cos-cloud
# Get the VM internal IP
export IP=`gcloud compute instances describe ${NAME} --zone ${ZONE} | grep "networkIP" | awk '{print $2}'`
# Promote the VM internal IP to static IP
gcloud compute addresses create mysql-proxy --addresses ${IP} --region ${REGION} --subnet ${SUBNET}
# Note down the IP to be used in CDF MySQL JDBC connection string
echo ${IP}
Once the VM is created with a static IP, now use this as the host for the mysql database to be accessed from Cloud Data Fusion.
Accessing MySQL from within Data Fusion
Before you begin accessing the mysql instance from Cloud Data Fusion instance, make sure you have installed the mysql driver from Hub (or directly from mysql portal).
An example of accessing the mysql database from Wrangler can look like the following screenshot:
The same can be accessed directly from the Pipeline Studio by using the Database source plugin with similar properties (JDBC connection string, username and password & plugin type), like the following screenshot:
A similar setup should work for both Postgres and SQLServer instances as well.
Related articles