Introduction

As of 5.1, a single Database source and Sink plugin handles different types of databases. To improve the user experience the plugins should be separated out specific to databases (ex: mysql, netezza) with a custom logo, tool tips that help users to configure specific databases (Ex: connection string).

The core database plugin code should be re-used where-ever applicable to minimize the total cost of ownership.

Use case(s)

Users can choose and install source and sink plugins specific to mysql, oracle, SqlServer, Netezza, DB2 and Postgres.
Users should have a customized experience in configuring each of the DB plugins by having a custom logos specific to the database that is being used.
Users should get relevant information from the tool tip
- The tool tip for the connection string should be customized specific to the database.
- The tool tip should describe accurately what each field is used for
User should get a performance comparable to Sqoop by utilizing sqoop libraries for the data ingestion and egress in the source and sink plugins
Users should not have to specify any redundant configuration (ex: JDBC type in source plugin, columns in the sink plugin)
Users should get field level lineage for the source and sink that is being used
Reference documentation should be updated to account for the changes
All the DB types should be supported
The source code for each of type of database should be separated out in repos under data-integrations org
Integration tests for specific plugins should be added in the test repos
The data pipeline using source and sink plugins should run on both mapreduce and spark engines

User Stories

Note: The same set of user stories applies to other databases: Netezza, SQLServer, Oracle, DB2 and Postgres

User should be able to install Mysql specific database source and sink plugins from the Hub
Users should have each tool tip accurately describe what each field does
Users should know the format for the mysql connection string by hovering over tool tip for connection string
Users should get field level lineage information for the mysql source and sink
Users should get a performance comparable to Sqoop when ingesting data from mysql and while writing data to mysql (within ~15% of the time taken for sqoop)
Users should be able to setup a pipeline avoiding specifying redundant information
Users should get updated reference document for mysql source and sink
Users should be able to read all the DB types

Deliverables

Source code in data integrations org
Performance test comparison with Sqoop
Integration test code
Relevant documentation in the source repo and reference documentation section in plugin

Relevant links

Existing DB plugin code: https://github.com/caskdata/hydrator-plugins/tree/develop/database-plugins
Data-integrations org: https://github.com/data-integrations/
Field level lineage: https://docs.cdap.io/cdap/5.1.0-SNAPSHOT/en/developer-manual/metadata/field-lineage.html
Integration test repos: https://github.com/caskdata/cdap-integration-tests

Plugin Type

Batch Source
Batch Sink
Real-time Source
Real-time Sink
Action
Post-Run Action
Aggregate
Join
Spark Model
Spark Compute

Configurables

This section defines properties that are configurable for this plugin.

User Facing Name	Type	Description	Constraints

Design / Implementation Tips

Tip #1
Tip #2

Design

Approach(s)

Properties

Security

Limitation(s)

Future Work

Some future work – HYDRATOR-99999
Another future work – HYDRATOR-99999

Test Case(s)

Test case #1
Test case #2

Sample Pipeline

Please attach one or more sample pipeline(s) and associated data.

Pipeline #1

Pipeline #2

Table of Contents

Checklist

User stories documented
User stories reviewed
Design documented
Design reviewed
Feature merged
Examples and guides
Integration tests
Documentation for feature
Short video demonstrating the feature

CDAP

Database plugin enhancements