Database plugin enhancements
- Sree Raman
- Illia
Owned by Sree Raman
Introduction
As of 5.1, a single Database source and Sink plugin handles different types of databases. To improve the user experience the plugins should be separated out specific to databases (ex: mysql, netezza) with a custom logo, tool tips that help users to configure specific databases (Ex: connection string).
The core database plugin code should be re-used where-ever applicable to minimize the total cost of ownership.Â
Use case(s)
- Users can choose and install source and sink plugins specific to mysql, oracle, SqlServer, Netezza, DB2 and Postgres.Â
- Users should have a customized experience in configuring each of the DB plugins by having a custom logos specific to the database that is being used.Â
- Users should get relevant information from the tool tipÂ
- The tool tip for the connection string should be customized specific to the database.Â
- The tool tip should describe accurately what each field is used for
- User should get a performance comparable to Sqoop by utilizing sqoop libraries for the data ingestion and egress in the source and sink plugins
- Users should not have to specify any redundant configuration (ex: JDBC type in source plugin, columns in the sink plugin)
- Users should get field level lineage for the source and sink that is being used
- Reference documentation should be updated to account for the changesÂ
- All the DB types should be supported
- The source code for each of type of database should be separated out in repos under data-integrations org
- Integration tests for specific plugins should be added in the test repos
- The data pipeline using source and sink plugins should run on both mapreduce and spark engines
User Stories
Note: The same set of user stories applies to other databases: Netezza, SQLServer, Oracle, DB2 and Postgres
- User should be able to install Mysql specific database source and sink plugins from the Hub
- Users should have each tool tip accurately describe what each field does
- Users should know the format for the mysql connection string by hovering over tool tip for connection string
- Users should get field level lineage information for the mysql source and sinkÂ
- Users should get a performance comparable to Sqoop when ingesting data from mysql and while writing data to mysql (within ~15% of the time taken for sqoop)
- Users should be able to setup a pipeline avoiding specifying redundant information
- Users should get updated reference document for mysql source and sink
- Users should be able to read all the DB types
DeliverablesÂ
- Source code in data integrations org
- Performance test comparison with Sqoop
- Integration test codeÂ
- Relevant documentation in the source repo and reference documentation section in plugin
Relevant linksÂ
- Existing DB plugin code:Â https://github.com/caskdata/hydrator-plugins/tree/develop/database-plugins
- Data-integrations org:Â https://github.com/data-integrations/
- Field level lineage:Â https://docs.cdap.io/cdap/5.1.0-SNAPSHOT/en/developer-manual/metadata/field-lineage.html
- Integration test repos:Â https://github.com/caskdata/cdap-integration-tests
Plugin Type
- Batch Source
- Batch SinkÂ
- Real-time Source
- Real-time Sink
- Action
- Post-Run Action
- Aggregate
- Join
- Spark Model
- Spark Compute
Configurables
This section defines properties that are configurable for this plugin.Â
User Facing Name | Type | Description | Constraints |
---|---|---|---|
Design / Implementation Tips
- Tip #1
- Tip #2
Design
Approach(s)
Properties
Security
Limitation(s)
Future Work
- Some future work – HYDRATOR-99999
- Another future work – HYDRATOR-99999
Test Case(s)
- Test case #1
- Test case #2
Sample Pipeline
Please attach one or more sample pipeline(s) and associated data.Â
Pipeline #1
Pipeline #2
Table of Contents
Checklist
- User stories documentedÂ
- User stories reviewedÂ
- Design documentedÂ
- Design reviewedÂ
- Feature mergedÂ
- Examples and guidesÂ
- Integration testsÂ
- Documentation for featureÂ
- Short video demonstrating the feature