Introduction

Data loading in Snowflake (https://docs.snowflake.net/manuals/user-guide/data-load-overview.html) is performed in two, separate steps. The first step is to load the files into a staging area, and the second step is to trigger the upload with a COPY INTO step. This action plugin will execute both steps using Snowflake JDBC.

Use case(s)

As a financial services company, I would like to build an analytics pipeline to push data into the Snowflake. I would leverage Hydrator, and use this plugin as action to create files in the staging area and trigger the loading of the data.

User Storie(s)

I would like to create files in the staging area and trigger loading of the data.
I would like to have capability to load files from internal and external(s3) location.

Plugin Type

Action

Configurables

This section defines properties that are configurable for this plugin.

User Facing Name	Type	Description	Constraints
Account Name	String	Snowflake account name
User Name	String	Username
Password	String	Password
Warehouse Name	String	Snowflake Warehouse
Database Name	String	Database
Schema	String	DB Schema
Table Name	String	Table where data is to be loaded
Stage Name	String	Staging area where data would be loaded intermediately
Data Source	Select	Source of the data to be loaded. Defaults to Internal.
File Path/S3 Bucket URL	String	Path/URL(AWS S3) of the data to be loaded
File Format Type	Select	File format to specify the type of file. Defaults to CSV.	Possible values are: CSV,AVRO,PARQUET,JSON,XML
Authentication Method	Select	Authentication method. Defaults to Access Credentials.	For IAM authentication, cluster should be hosted on AWS servers
S3 Access Key	String	Access key for AWS S3 to connect to.	Mandatory if Data Source is external(AWS S3).
S3 Secret Access Key	String	Secret access key for AWS S3 to connect to.	Mandatory if Data Source is external(AWS S3).

Design / Implementation Tips

See this document for an overview: https://docs.snowflake.net/manuals/user-guide/data-load-overview.html

Design

Approach(s)

1.For loading files from internal location,below high level commands would be executed in snowflake using jdbc:
a.create or replace stage
b.PUT files onto staging location from the specified path where globbing patterns (i.e. wildcards) are accepted.
c.COPY INTO specified snowflake table.
2.For loading files from external location i.e S3,below high level commands would be executed in snowflake using jdbc:
a.create or replace stage using URL option
b.COPY INTO specified snowflake table having a single VARIANT column.

Properties

accountName: Snowflake account name
userName: Username
password: Password
warehouse: Snowflake warehouse
dbName: Database Name
schema: DB Schema
tableName: The Snowflake table name where the data will be loaded
stageName: The staging location where the data from the data source will be loaded
dataSource: Source of the data to be loaded. Defaults to Internal
path: Path/URL(AWS S3) of the data to be loaded
fileFormatType: File format to specify the type of file. Defaults to CSV
s3AuthenticationMethod: Authentication method. Defaults to Access Credentials. For IAM authentication, cluster should be hosted on AWS servers
accessKey: Access key for AWS S3 to connect to.Mandatory if dataSource is AWS S3
secretAccessKey: Secret access key for AWS S3 to connect to. Mandatory if dataSource is AWS S3

Limitation(s)

1.COPY INTO command ignores staged data files that were already loaded into the table.FORCE parameter is not used since loading of same files is not required.https://docs.snowflake.net/manuals/sql-reference/sql/copy-into-table.html
2.JSON/XML/AVRO/PARQUET file format can produce one and only one column of type variant or object or array. Use CSV file format if you want to load more than one column.
3.Default values for format options are considered.https://docs.snowflake.net/manuals/sql-reference/sql/create-stage.html
4.Create or replace stage command is used to create stage so that if stage already exists with different set of files,would be replaced.

Future Work

Some future work – HYDRATOR-99999
Another future work – HYDRATOR-99999

Test Case(s)

Load Avro data from S3 to Snowflake using s3 keys
Load Parquet data from S3 to Snowflake using s3 keys

Sample Pipeline

S3SinkAvroCredentials1_Snow_v2-cdap-data-pipeline.json

S3SinkParquetCredentials1_Snow_v2-cdap-data-pipeline.json

Pipeline #1

Pipeline #2

Table of Contents

Checklist

User stories documented
User stories reviewed
Design documented
Design reviewed
Feature merged
Examples and guides
Integration tests
Documentation for feature
Short video demonstrating the feature

CDAP

Snowflake Data Loading Action