Snowflake database plugin
Introduction
A separate database plugin to support Snowflake-specific features and configurations.
Use-Case
- Users can choose and install Snowflake source and sink plugins.
- Users should see Snowflake logo on plugin configuration page for better experience.
- Users should get relevant information from the tool tip:
- The tool tip should describe accurately what each field is used for.
- Users should not have to specify any redundant configuration
- Users should get field level lineage for the source and sink that is being used.
- Reference documentation should be updated to account for the changes.
- The source code for Snowflake database plugin should be placed in repo under data-integrations org.
- The data pipeline using source and sink plugins should run on both mapreduce and spark engines.
User Stories
- User should be able to install Snowflake specific database source and sink plugins from the Hub
- Users should have each tool tip accurately describe what each field does
- Users should get field level lineage information for the Snowflake source and sink
- Users should be able to setup a pipeline avoiding specifying redundant information
- Users should get updated reference document for Snowflake source and sink
- Users should be able to read all the DB types
Plugin Type
- Batch Source
- Batch Sink
- Real-time Source
- Real-time Sink
- Action
- Post-Run Action
- Aggregate
- Join
- Spark Model
- Spark Compute
Snowflake Overview
Snowflake is an analytic data warehouse provided as Software-as-a-Service (SaaS). Snowflake provides a data warehouse that is faster, easier to use, and far more flexible than traditional data warehouse offerings.
Snowflake’s data warehouse is not built on an existing database or “big data” software platform such as Hadoop. The Snowflake data warehouse uses a new SQL database engine with a unique architecture designed for the cloud. To the user, Snowflake has many similarities to other enterprise data warehouses, but also has additional functionality and unique capabilities.
Snowflake bulk API
Using JDBC for loading data has performance limitations. Snowflake provides bulk APIs for loading data.
COPY INTO <table> command loads data from staged files to an existing table. The files must already be staged in one of the following locations:
- Named internal stage (or table/user stage). Files can be staged using the PUT command.
- Named external stage that references an external location (AWS S3, Google Cloud Storage, or Microsoft Azure).
- External location (AWS S3, Google Cloud Storage, or Microsoft Azure).
Example:
-- Stages copy into mytable from '@mystage/path 1/file 1.csv'; copy into mytable from '@%mytable/path 1/file 1.csv'; copy into mytable from '@~/path 1/file 1.csv'; -- S3 bucket copy into mytable from 's3://mybucket 1/prefix 1/file 1.csv'; -- Azure container copy into mytable from 'azure://myaccount.blob.core.windows.net/mycontainer/encrypted_files/file 1.csv';
Also, there is an option to use Snowpipe to load data continuously.
Notes:
- Staged files can be validated without loading.
- Files can be purged after loading.
- There are several Format Types: CSV, JSON, AVRO, ORC, PARQUET, XML. Depending on the file format type specified, one or more format-specific options can be used.
- There is an option to Transform Data During a Load using Transformation Parameters.
COPY INTO <location> command unloads data from a table (or query) into one or more files in one of the following locations:
- Named internal stage (or table/user stage). The files can then be downloaded from the stage/location using the GET command.
- Named external stage that references an external location (AWS S3, Google Cloud Storage, or Microsoft Azure).
- External location (AWS S3, Google Cloud Storage, or Microsoft Azure).
Example:
-- Stages copy into '@mystage/path 1/file 1.csv' from mytable; copy into '@%mytable/path 1/file 1.csv' from mytable; copy into '@~/path 1/file 1.csv' from mytable; -- S3 bucket copy into 's3://mybucket 1/prefix 1/file 1.csv' from mytable; -- Azure container copy into 'azure://myaccount.blob.core.windows.net/mycontainer/encrypted_files/file 1.csv' from mytable;
Notes:
- Data to be Unloaded (from a Query) can be validated.
- There are several Format Types: CSV, JSON, PARQUET. Depending on the file format type specified, one or more format-specific options can be used.
Design Tips
JDBC Driver API Support: https://docs.snowflake.net/manuals/user-guide/jdbc-api.html
Loading Data into Snowflake: https://docs.snowflake.net/manuals/user-guide-data-load.html
Design
The suggestion is to create a new maven sub-module in the database-plugins repo under the data-integrations organization.
Action Plugins
The proposal is to create a Snowflake Data Loading Action Plugin that will utilize COPY INTO <table> command to load data files into Snowflake and Snowflake Data Unloading Action Plugin that will utilize COPY INTO <location> command to unload data from a table (or query) into one or more files.
Also, Snowflake Action Plugin that runs a SQL query should be created.
Batch Source/Sink Plugins
Option 1
The first option is to create a new maven sub-module in the database-plugins repo under the data-integrations organization for Batch Source/Sink Plugins and implement them using JDBC in the same way as other DB plugins.
Option 2
Using JDBC for loading data has performance limitations, so we can utilize Snowflake's bulk APIs for loading data.
Source Plugin
There is an option for the Source plugin to unload data from a table (or query) using COPY INTO <location> command into one or more files that than will be directly read by the plugin.
Example that unloads the result of a query into a named internal stage (my_stage) using a folder/filename prefix (result/data_), a named file format (myformat), and gzip compression:
copy into @my_stage/result/data_ from (select * from orderstiny) file_format=(format_name='myformat' compression='gzip'); ---------------+-------------+--------------+ rows_unloaded | input_bytes | output_bytes | ---------------+-------------+--------------+ 73 | 8339 | 3322 | ---------------+-------------+--------------+
Data Files than can be directly downloaded from an Internal Stage to a Stream:
Connection connection = DriverManager.getConnection(url, prop); InputStream out = connection.unwrap(SnowflakeConnection.class).downloadStream( "~", DEST_PREFIX + "/" + TEST_DATA_FILE + ".gz", true);
Files can be stored in one of the following locations:
- Named internal stage (or table/user stage). The files can then be downloaded from the stage/location using the GET command.
Named external stage that references an external location (AWS S3, Google Cloud Storage, or Microsoft Azure).
External location (AWS S3, Google Cloud Storage, or Microsoft Azure).
The proposal is to use an internal stage since files can then be downloaded directly using SnowflakeConnection#downloadStream method.
Source Splitter
The proposal is to determine the number of splits according to the number of staged files that were created using COPY INTO <location> command. The number of resulting files can be controlled using MAX_FILE_SIZE Copy Options. The proposal is to add "Maximum Split Size" Source configuration property which will use MAX_FILE_SIZE copy option.
LIST command returns a list of files that have been staged.
Example of listing the files that match a regular expression (i.e. all file names containing the string data_0) in a named stage (my_csv_stage) with a prefix (/analysis/):
list @my_csv_stage/analysis/ pattern='.*data_0.*'; +--------------------+------+----------------------------------+------------------------------+ | name | size | md5 | last_modified | |--------------------+------+----------------------------------+------------------------------| | employees01.csv.gz | 288 | a851f2cc56138b0cd16cb603a97e74b1 | Tue, 9 Jan 2018 15:31:44 GMT | | employees02.csv.gz | 288 | 125f5645ea500b0fde0cdd5f54029db9 | Tue, 9 Jan 2018 15:31:44 GMT | | employees03.csv.gz | 304 | eafee33d3e62f079a054260503ddb921 | Tue, 9 Jan 2018 15:31:45 GMT | | employees04.csv.gz | 304 | 9984ab077684fbcec93ae37479fa2f4d | Tue, 9 Jan 2018 15:31:44 GMT | | employees05.csv.gz | 304 | 8ad4dc63a095332e158786cb6e8532d0 | Tue, 9 Jan 2018 15:31:44 GMT | +--------------------+------+----------------------------------+------------------------------+
Sink Plugin
For the Sink, it's possible to write data to the internal stage files first(according to the File Sizing Recommendations) and then use COPY INTO <table> command.
Example of loading files from a named internal stage into a table:
copy into mytable from @my_int_stage;
Data Files can be uploaded directly from a Stream to an Internal Stage:
Connection connection = DriverManager.getConnection(url, prop); File file = new File("/tmp/test.csv"); FileInputStream fileInputStream = new FileInputStream(file); // upload file stream to user stage connection.unwrap(SnowflakeConnection.class).uploadStream("MYSTAGE", "testUploadStream", fileInputStream, "destFile.csv", true);
Files can be staged in one of the following locations:
- Named internal stage (or table/user stage). Files can be staged using the PUT command.
- Named external stage that references an external location (AWS S3, Google Cloud Storage, or Microsoft Azure).
- External location (AWS S3, Google Cloud Storage, or Microsoft Azure).
The proposal is to use an internal stage since files can be uploaded directly using SnowflakeConnection#uploadStream method.
Option 3
Although it is possible to create a Batch Sink Plugin that will accept files' locations and utilize COPY INTO <table> command (or Snowpipe) to load data files into Snowflake, it seems that it's not a good idea:
- COPY INTO <table> command uses locations of the raw data files, so it won't be possible to use this sink in the same way as other Database Sinks.
- No transformations can be done on actual data on the CDAP since we operate on file locations and not on the actual data. It is still possible to perform a transformation on the Snowflake side using Transformation Parameters.
Source Properties
Option 1
Section | User Configuration Label | Label Description | Options | Default | Variable | User Widget |
---|---|---|---|---|---|---|
General | Label | Label for UI. | textbox | |||
Reference Name | Uniquely identified name for lineage. | referenceName | textbox | |||
Account Name | Full name of Snowflake account. | accountName | textbox | |||
Database | Database name to connect to. | database | textbox | |||
Import Query | Query for import data. | importQuery | textarea | |||
Credentials | Username | User identity for connecting to the specified database. | username | textbox | ||
Password | Password to use to connect to the specified database. | password | password | |||
Key Pair Authentication | Key Pair Authentication Enabled | If true, plugin will perform Key Pair authentication. |
| False | keyPairEnabled | toggle |
Key File Path | Path to the private key file. | path | textbox | |||
OAuth2 | OAuth2 Enabled | If true, plugin will perform OAuth2 authentication. |
| False | oauth2Enabled | toggle |
Auth URL | Endpoint for the authorization server used to retrieve the authorization code. | authUrl | textbox | |||
Token URL | Endpoint for the resource server, which exchanges the authorization code for an access token. | tokenUrl | textbox | |||
Client ID | Client identifier obtained during the Application registration process. | clientId | textbox | |||
Client Secret | Client secret obtained during the Application registration process. | clientSecret | password | |||
Scopes | Scope of the access request, which might have multiple space-separated values. | scopes | textbox | |||
Refresh Token | Token used to receive accessToken, which is end product of OAuth2. | refreshToken | textbox | |||
Advanced | Bounding Query | Bounding Query should return the min and max of the values of the 'splitBy' field. For example, 'SELECT MIN(id),MAX(id) FROM table'. Not required if numSplits is set to one. | boundingQuery | textarea | ||
Split-By Field Name | Field Name which will be used to generate splits. Not required if numSplits is set to one. | splitBy | textbox | |||
Number of Splits to Generate | Number of splits to generate. | numSplits | textbox | |||
Connection Arguments | A list of arbitrary string tag/value pairs as connection arguments. See: https://docs.snowflake.net/manuals/user-guide/jdbc-configure.html#jdbc-driver-connection-string | connectionArguments | keyvalue |
Notes:
- Please, refer Plugin OAuth2 Common Module for OAuth2 common module design information.
Option 2
Section | User Configuration Label | Label Description | Options | Default | Variable | User Widget |
---|---|---|---|---|---|---|
General | Label | Label for UI. | textbox | |||
Reference Name | Uniquely identified name for lineage. | referenceName | textbox | |||
Account Name | Full name of Snowflake account. | accountName | textbox | |||
Database | Database name to connect to. | database | textbox | |||
Import Query | Query for import data. | importQuery | textarea | |||
Credentials | Username | User identity for connecting to the specified database. | username | textbox | ||
Password | Password to use to connect to the specified database. | password | password | |||
Key Pair Authentication | Key Pair Authentication Enabled | If true, plugin will perform Key Pair authentication. |
| False | keyPairEnabled | toggle |
Key File Path | Path to the private key file. | path | textbox | |||
OAuth2 | OAuth2 Enabled | If true, plugin will perform OAuth2 authentication. |
| False | oauth2Enabled | toggle |
Auth URL | Endpoint for the authorization server used to retrieve the authorization code. | authUrl | textbox | |||
Token URL | Endpoint for the resource server, which exchanges the authorization code for an access token. | tokenUrl | textbox | |||
Client ID | Client identifier obtained during the Application registration process. | clientId | textbox | |||
Client Secret | Client secret obtained during the Application registration process. | clientSecret | password | |||
Scopes | Scope of the access request, which might have multiple space-separated values. | scopes | textbox | |||
Refresh Token | Token used to receive accessToken, which is end product of OAuth2. | refreshToken | textbox | |||
Advanced | Maximum Split Size | Maximum split size specified in bytes | maxSplitSize | number | ||
Connection Arguments | A list of arbitrary string tag/value pairs as connection arguments. See: https://docs.snowflake.net/manuals/user-guide/jdbc-configure.html#jdbc-driver-connection-string | connectionArguments | keyvalue |
Notes:
- The table above is similar to the Option1.1 one except of splitter-related properties. Please, refer to the Design section for the proposal of the splitter design.
Source Data Types Mapping
Snowflake Data Types | CDAP Schema Data Type | Comment |
---|---|---|
NUMBER | decimal | Default precision and scale are (38,0). |
DECIMAL | decimal | Synonymous with NUMBER. |
NUMERIC | decimal | Synonymous with NUMBER. |
INT, INTEGER, BIGINT, SMALLINT | decimal | Synonymous with NUMBER, except that precision and scale cannot be specified (i.e. always defaults to NUMBER(38, 0)). |
FLOAT, FLOAT4, FLOAT8 | double | Snowflake uses double-precision (64 bit) IEEE 754 floating point numbers. |
DOUBLE | double | Synonymous with FLOAT. |
DOUBLE PRECISION | double | Synonymous with FLOAT. |
REAL | double | Synonymous with FLOAT. |
VARCHAR | string | Default (and maximum) is 16,777,216 bytes. |
CHAR, CHARACTER | string | Synonymous with VARCHAR except default length is VARCHAR(1). |
STRING | string | Synonymous with VARCHAR. |
TEXT | string | Synonymous with VARCHAR. |
BINARY | bytes | |
VARBINARY | bytes | Synonymous with BINARY. |
BOOLEAN | boolean | |
DATE | date | |
DATETIME | timestamp | Alias for TIMESTAMP_NTZ |
TIME | time | |
TIMESTAMP | timestamp/string | Alias for one of the TIMESTAMP variations (TIMESTAMP_NTZ by default). |
TIMESTAMP_LTZ | timestamp | TIMESTAMP with local time zone; time zone, if provided, is not stored. |
TIMESTAMP_NTZ | timestamp | TIMESTAMP with no time zone; time zone, if provided, is not stored. |
TIMESTAMP_TZ | string | TIMESTAMP with time zone. |
VARIANT | string | A tagged universal type, which can store values of any other type, including OBJECT and ARRAY, up to a maximum size of 16 MB compressed. |
OBJECT | record | |
ARRAY | array |
See:
Sink Properties
Section | User Configuration Label | Label Description | Options | Default | Variable | User Widget |
---|---|---|---|---|---|---|
General | Label | Label for UI. | textbox | |||
Reference Name | Uniquely identified name for lineage. | referenceName | textbox | |||
Account Name | Full name of Snowflake account. | accountName | textbox | |||
Database | Database name to connect to connect to. | database | textbox | |||
Table Name | Name of a database table to write to. | table | textbox | |||
Credentials | Username | User identity for connecting to the specified database. | username | textbox | ||
Password | Password to use to connect to the specified database. | password | password | |||
Key Pair Authentication | Key Pair Authentication Enabled | If true, plugin will perform Key Pair authentication. |
| False | keyPairEnabled | toggle |
Key File Path | Path to the private key file. | path | textbox | |||
OAuth2 | OAuth2 Enabled | If true, plugin will perform OAuth2 authentication. |
| False | oauth2Enabled | toggle |
Auth URL | Endpoint for the authorization server used to retrieve the authorization code. | authUrl | textbox | |||
Token URL | Endpoint for the resource server, which exchanges the authorization code for an access token. | tokenUrl | textbox | |||
Client ID | Client identifier obtained during the Application registration process. | clientId | textbox | |||
Client Secret | Client secret obtained during the Application registration process. | clientSecret | password | |||
Scopes | Scope of the access request, which might have multiple space-separated values. | scopes | textbox | |||
Refresh Token | Token used to receive accessToken, which is end product of OAuth2. | refreshToken | textbox | |||
Advanced | Connection Arguments | A list of arbitrary string tag/value pairs as connection arguments. See: https://docs.snowflake.net/manuals/user-guide/jdbc-configure.html#jdbc-driver-connection-string | connectionArguments | keyvalue |
Sink Data Types Mapping
CDAP Schema Data Type | Snowflake Data Types | Comment |
---|---|---|
boolean | BOOLEAN | |
bytes | BINARY | |
date | DATE | |
double | FLOAT | Snowflake uses double-precision (64 bit) IEEE 754 floating point numbers. |
decimal | NUMBER(s, p) | |
float | FLOAT | |
int | NUMBER(s, p) | Where p >= 10. It's safe to write primitives as values of decimal logical type in the case of valid precision. |
long | NUMBER(s, p) | Where p >= 19. It's safe to write primitives as values of decimal logical type in the case of valid precision. |
string | VARCHAR | |
time | TIME | |
timestamp | TIMESTAMP_NTZ | |
array | ARRAY | |
record | OBJECT | |
enum | VARCHAR | |
map | OBJECT | |
union | VARIANT |
Action Plugin Properties
Section | User Configuration Label | Label Description | Options | Default | Variable | User Widget |
---|---|---|---|---|---|---|
General | Label | Label for UI | textbox | |||
Account Name | Full name of Snowflake account. | accountName | textbox | |||
Database | Database name to connect to connect to. | database | textbox | |||
Query | SQL query to run. | query | textarea | |||
Credentials | Username | User identity for connecting to the specified database. | username | textbox | ||
Password | Password to use to connect to the specified database. | password | password | |||
Key Pair Authentication | Key Pair Authentication Enabled | If true, plugin will perform Key Pair authentication. |
| False | keyPairEnabled | toggle |
Key File Path | Path to the private key file. | path | textbox | |||
OAuth2 | OAuth2 Enabled | If true, plugin will perform OAuth2 authentication. |
| False | oauth2Enabled | toggle |
Auth URL | Endpoint for the authorization server used to retrieve the authorization code. | authUrl | textbox | |||
Token URL | Endpoint for the resource server, which exchanges the authorization code for an access token. | tokenUrl | textbox | |||
Client ID | Client identifier obtained during the Application registration process. | clientId | textbox | |||
Client Secret | Client secret obtained during the Application registration process. | clientSecret | password | |||
Scopes | Scope of the access request, which might have multiple space-separated values. | scopes | textbox | |||
Refresh Token | Token used to receive accessToken, which is end product of OAuth2. | refreshToken | textbox | |||
Advanced | Connection Arguments | A list of arbitrary string tag/value pairs as connection arguments. See: https://docs.snowflake.net/manuals/user-guide/jdbc-configure.html#jdbc-driver-connection-string | connectionArguments | keyvalue |
Snowflake Data Loading Action Plugin Properties
Section | User Configuration Label | Label Description | Options | Default | Variable | User Widget | Comment |
---|---|---|---|---|---|---|---|
General | Label | Label for UI | textbox | ||||
Account Name | Full name of Snowflake account. | accountName | textbox | ||||
From | Internal or external location where the files containing data to be loaded are staged | location | textbox | ||||
Into | Name of the table into which data is loaded. | table | textbox | ||||
Select | Optional SELECT statement used for transformations. Specifies an explicit set of fields/columns (separated by commas) to load from the staged data files. The fields/columns are selected from the files using a standard SQL query. The list must match the sequence of columns in the target table. | select | textarea | ||||
Credentials | Username | Login name of the user for the connection. | username | textbox | |||
Password | Password for the specified user. | password | password | ||||
Key Pair Authentication | Key Pair Authentication Enabled | If true, plugin will perform Key Pair authentication. |
| False | keyPairEnabled | toggle | |
Key File Path | Path to the private key file. | path | textbox | Displayed only if Key Pair Authentication Enabled set to true. | |||
OAuth2 | OAuth2 Enabled | If true, plugin will perform OAuth2 authentication. |
| False | oauth2Enabled | toggle | |
Auth URL | Endpoint for the authorization server used to retrieve the authorization code. | authUrl | textbox | Displayed only if OAuth2 Enabled set to true. | |||
Token URL | Endpoint for the resource server, which exchanges the authorization code for an access token. | tokenUrl | textbox | Displayed only if OAuth2 Enabled set to true. | |||
Client ID | Client identifier obtained during the Application registration process. | clientId | textbox | Displayed only if OAuth2 Enabled set to true. | |||
Client Secret | Client secret obtained during the Application registration process. | clientSecret | password | Displayed only if OAuth2 Enabled set to true. | |||
Scopes | Scope of the access request, which might have multiple space-separated values. | scopes | textbox | Displayed only if OAuth2 Enabled set to true. | |||
Refresh Token | Token used to receive accessToken, which is end product of OAuth2. | refreshToken | textbox | Displayed only if OAuth2 Enabled set to true. | |||
Cloud Provider Parameters | Use Cloud Provider Parameters | If true, plugin will use specified Cloud Provider Parameters. |
| False | useCloudProviderParameters | toggle | |
Cloud Provider |
| GCP | cloudProvider | radio-group | Displayed only if Use Cloud Provider Parameters set to true. | ||
Storage Integration | Name of the storage integration used to delegate authentication responsibility for external cloud storage to a Snowflake identity and access management (IAM) entity. For more details, see CREATE STORAGE INTEGRATION. | storageIntegration | textbox | Displayed only if GCP or AWS Cloud Provider selected. | |||
Key Id | Key Id for connecting to AWS and accessing the private/protected S3 bucket where the files to load are staged. For more information, see Configuring Secure Access to AWS S3. | keyId | textbox | Displayed only if AWS Cloud Provider selected. | |||
Secret Key | Secret Key for connecting to AWS and accessing the private/protected S3 bucket where the files to load are staged. For more information, see Configuring Secure Access to AWS S3. | secretKey | password | Displayed only if AWS Cloud Provider selected. | |||
Token | Token for connecting to AWS and accessing the private/protected S3 bucket where the files to load are staged. For more information, see Configuring Secure Access to AWS S3. | token | textbox | Displayed only if AWS Cloud Provider selected. | |||
SAS Token | Shared access signature token for connecting to Azure and accessing the private/protected container where the files containing data are staged. Credentials are generated by Azure. | sasToken | textbox | Displayed only if Microsoft Azure Cloud Provider selected. | |||
Files Encrypted | If true, plugin will perform loading from encrypted files. |
| False | filesEncrypted | toggle | Displayed only if AWS or Microsoft Azure Cloud Provider selected. | |
Encryption Type | Encryption type used. | For AWS:
For Azure:
| encryptionType | select | Displayed only if Files Encrypted set to true. | ||
Master Key | Client-side master key that was used to encrypt the files in the bucket. The master key must be a 128-bit or 256-bit key in Base64-encoded form. Snowflake requires this key to decrypt encrypted files in the bucket and extract data for loading. | masterKey | textbox | Displayed only if to AWS_CSE or AZURE_CSE Encryption Type selected. | |||
Master Key Id | AWS Master Key ID. | masterKeyId | textbox | Displayed only if to AWS_SSE_KMS Encryption Type selected. | |||
File Format | File Format | Optional parameter to specify the format of the data files to load. |
| Undefined | fileFormat | toggle | |
Format Name | Existing named file format to use for loading data into the table. The named file format determines the format type (CSV, JSON, etc.), as well as any other format options, for the data files. For more information, see CREATE FILE FORMAT. | fileFormatName | textbox | Displayed only if 'By Name' File Format selected. | |||
Format Type | Type of files to load into the table. If a format type is specified, then additional format-specific options can be specified. For more details, see Format Type Options. |
| fileFormatType | select | Displayed only if 'By Type' File Format selected. | ||
Format Type Options | Format-specific options separated by blank spaces, commas, or new lines. | typeOptions | textarea | Displayed only if CSV, JSON, AVRO, PARQUET or XML Format Type selected. ORC does not support any format type options. | |||
Advanced | Files | List of one or more files names (separated by commas) to be loaded. The files must already have been staged in either the Snowflake internal location or external location specified in the command. The maximum number of files names that can be specified is 1000. | files | textbox | |||
Pattern | Regular expression pattern string, enclosed in single quotes, specifying the file names and/or paths to match. For the best performance, try to avoid applying patterns that filter on a large number of files. | pattern | textbox | ||||
Copy Options | One or more copy options separated by blank spaces, commas, or new lines. | copyOptions | textarea | ||||
Connection Arguments | A list of arbitrary string tag/value pairs as connection arguments. See: https://docs.snowflake.net/manuals/user-guide/jdbc-configure.html#jdbc-driver-connection-string | connectionArguments | keyvalue |
Notes:
- Snowflake Data Loading Action Plugin uses COPY INTO <table> command to load data files into Snowflake.
Snowflake Data Unloading Action Plugin Properties
User Configuration Label | Label Description | Options | Default | Variable | User Widget | Comment | |
---|---|---|---|---|---|---|---|
General | Label | Label for UI | textbox | ||||
Account Name | Full name of Snowflake account. | accountName | textbox | ||||
From | Source of the data to be unloaded, which can either be a table or a query. | source | textarea | ||||
Into | Internal or external location where the files containing data to be loaded are staged | location | textbox | ||||
Credentials | Username | Login name of the user for the connection. | username | textbox | |||
Password | Password for the specified user. | password | password | ||||
Key Pair Authentication | Key Pair Authentication Enabled | If true, plugin will perform Key Pair authentication. |
| False | keyPairEnabled | toggle | |
Key File Path | Path to the private key file. | path | textbox | Displayed only if Key Pair Authentication Enabled set to true. | |||
OAuth2 | OAuth2 Enabled | If true, plugin will perform OAuth2 authentication. |
| False | oauth2Enabled | toggle | |
Auth URL | Endpoint for the authorization server used to retrieve the authorization code. | authUrl | textbox | Displayed only if OAuth2 Enabled set to true. | |||
Token URL | Endpoint for the resource server, which exchanges the authorization code for an access token. | tokenUrl | textbox | Displayed only if OAuth2 Enabled set to true. | |||
Client ID | Client identifier obtained during the Application registration process. | clientId | textbox | Displayed only if OAuth2 Enabled set to true. | |||
Client Secret | Client secret obtained during the Application registration process. | clientSecret | password | Displayed only if OAuth2 Enabled set to true. | |||
Scopes | Scope of the access request, which might have multiple space-separated values. | scopes | textbox | Displayed only if OAuth2 Enabled set to true. | |||
Refresh Token | Token used to receive accessToken, which is end product of OAuth2. | refreshToken | textbox | Displayed only if OAuth2 Enabled set to true. | |||
Cloud Provider Parameters | Use Cloud Provider Parameters | If true, plugin will use specified Cloud Provider Parameters. |
| False | useCloudProviderParameters | toggle | |
Cloud Provider |
| GCP | cloudProvider | radio-group | Displayed only if Use Cloud Provider Parameters set to true. | ||
Storage Integration | Name of the storage integration used to delegate authentication responsibility for external cloud storage to a Snowflake identity and access management (IAM) entity. For more details, see CREATE STORAGE INTEGRATION. | storageIntegration | textbox | Displayed only if GCP or AWS Cloud Provider selected. | |||
Key Id | Key Id for connecting to AWS and accessing the private/protected S3 bucket where the files to load are staged. For more information, see Configuring Secure Access to AWS S3. | keyId | textbox | Displayed only if AWS Cloud Provider selected. | |||
Secret Key | Secret Key for connecting to AWS and accessing the private/protected S3 bucket where the files to load are staged. For more information, see Configuring Secure Access to AWS S3. | secretKey | password | Displayed only if AWS Cloud Provider selected. | |||
Token | Token for connecting to AWS and accessing the private/protected S3 bucket where the files to load are staged. For more information, see Configuring Secure Access to AWS S3. | token | textbox | Displayed only if AWS Cloud Provider selected. | |||
SAS Token | Shared access signature token for connecting to Azure and accessing the private/protected container where the files containing data are staged. Credentials are generated by Azure. | sasToken | textbox | Displayed only if Microsoft Azure Cloud Provider selected. | |||
Files Encrypted | If true, plugin will perform loading from encrypted files. |
| False | filesEncrypted | toggle | Displayed only if AWS or Microsoft Azure Cloud Provider selected. | |
Encryption Type | Encryption type used. | For AWS:
For Azure:
| encryptionType | select | Displayed only if Files Encrypted set to true. | ||
Master Key | Client-side master key that was used to encrypt the files in the bucket. The master key must be a 128-bit or 256-bit key in Base64-encoded form. Snowflake requires this key to decrypt encrypted files in the bucket and extract data for loading. | masterKey | textbox | Displayed only if to AWS_CSE or AZURE_CSE Encryption Type selected. | |||
Master Key Id | AWS Master Key ID. | masterKeyId | textbox | Displayed only if to AWS_SSE_KMS Encryption Type selected. | |||
File Format | File Format | Optional parameter to specify the format of the data files to load. |
| Undefined | fileFormat | toggle | |
Format Name | Existing named file format to use for loading data into the table. The named file format determines the format type (CSV, JSON, etc.), as well as any other format options, for the data files. For more information, see CREATE FILE FORMAT. | fileFormatName | textbox | Displayed only if 'By Name' File Format selected. | |||
Format Type | Type of files to load into the table. If a format type is specified, then additional format-specific options can be specified. For more details, see Format Type Options. |
| fileFormatType | select | Displayed only if 'By Type' File Format selected. | ||
Format Type Options | Format-specific options separated by blank spaces, commas, or new lines. | typeOptions | textarea | Displayed only if 'By Type' File Format selected. | |||
Advanced | Copy Options | One or more copy options separated by blank spaces, commas, or new lines. | copyOptions | textarea | |||
Include Header | Specifies whether to include the table column headings in the output files. For more details, see Optional Parameters. |
| False | includeHeader | toggle | ||
Connection Arguments | A list of arbitrary string tag/value pairs as connection arguments. See: https://docs.snowflake.net/manuals/user-guide/jdbc-configure.html#jdbc-driver-connection-string | connectionArguments | keyvalue |
Notes:
- Snowflake Data Unloading Action Plugin uses COPY INTO <location> command to unload data from a table (or query) into one or more files.
Approach
Create a new maven project in it's own repository.