Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Introduction

A separate database plugin to support Snowflake-specific features and configurations.

Use-Case

  • Users can choose and install Snowflake source and sink plugins.
  • Users should see Snowflake logo on plugin configuration page for better experience.
  • Users should get relevant information from the tool tip:
    • The tool tip should describe accurately what each field is used for.
  • Users should not have to specify any redundant configuration
  • Users should get field level lineage for the source and sink that is being used.
  • Reference documentation should be updated to account for the changes.
  • The source code for Snowflake database plugin should be placed in repo under data-integrations org.
  • The data pipeline using source and sink plugins should run on both mapreduce and spark engines.

User Stories

  • User should be able to install Snowflake specific database source and sink plugins from the Hub
  • Users should have each tool tip accurately describe what each field does
  • Users should get field level lineage information for the Snowflake source and sink 
  • Users should be able to setup a pipeline avoiding specifying redundant information
  • Users should get updated reference document for Snowflake source and sink
  • Users should be able to read all the DB types

Plugin Type

  • Batch Source
  • Batch Sink 
  • Real-time Source
  • Real-time Sink
  • Action
  • Post-Run Action
  • Aggregate
  • Join
  • Spark Model
  • Spark Compute

Snowflake Overview

Snowflake is an analytic data warehouse provided as Software-as-a-Service (SaaS). Snowflake provides a data warehouse that is faster, easier to use, and far more flexible than traditional data warehouse offerings.

Snowflake’s data warehouse is not built on an existing database or “big data” software platform such as Hadoop. The Snowflake data warehouse uses a new SQL database engine with a unique architecture designed for the cloud. To the user, Snowflake has many similarities to other enterprise data warehouses, but also has additional functionality and unique capabilities.


Design Tips

JDBC Driver API Support: https://docs.snowflake.net/manuals/user-guide/jdbc-api.html

Loading Data into Snowflake: https://docs.snowflake.net/manuals/user-guide-data-load.html


Design

The suggestion is to create a new maven project in it's own repository.

Snowflake bulk API

Using JDBC for loading data has performance limitations. Snowflake provides bulk APIs for loading data: https://docs.snowflake.net/manuals/user-guide-data-load.html

Source Properties

Section

User Configuration LabelLabel DescriptionOptionsDefaultVariableUser Widget
GeneralLabelLabel for UI.


textbox

Reference NameUniquely identified name for lineage.

referenceNametextbox


Source Data Types Mapping

Snowflake Data TypesCDAP Schema Data Type


See: 

Sink Properties

SectionUser Configuration LabelLabel DescriptionOptionsDefaultVariableUser Widget
GeneralLabelLabel for UI.


textbox

Reference NameUniquely identified name for lineage.

referenceNametextbox


Sink Data Types Mapping

CDAP Schema Data TypeSnowflake Data Types


Action Properties


User Configuration Label

Label Description

OptionsDefault

Variable

User Widget
General

Label

Label for UI




textbox


Approach

Create a new maven project in it's own repository.

Pipeline Samples


Releases

Release X.Y.Z

Related Work

Database plugin enhancements

  • No labels