Database plugin enhancements

Introduction

As of 5.1, a single Database source and Sink plugin handles different types of databases. To improve the user experience the plugins should be separated out specific to databases (ex: mysql, netezza) with a custom logo, tool tips that help users to configure specific databases (Ex: connection string).

The core database plugin code should be re-used where-ever applicable to minimize the total cost of ownership. 

Use case(s)

  • Users can choose and install source and sink plugins specific to mysql, oracle, SqlServer, Netezza, DB2 and Postgres. 
  • Users should have a customized experience in configuring each of the DB plugins by having a custom logos specific to the database that is being used. 
  • Users should get relevant information from the tool tip 
    • The tool tip for the connection string should be customized specific to the database. 
    • The tool tip should describe accurately what each field is used for
  • User should get a performance comparable to Sqoop by utilizing sqoop libraries for the data ingestion and egress in the source and sink plugins
  • Users should not have to specify any redundant configuration (ex: JDBC type in source plugin, columns in the sink plugin)
  • Users should get field level lineage for the source and sink that is being used
  • Reference documentation should be updated to account for the changes 
  • All the DB types should be supported
  • The source code for each of type of database should be separated out in repos under data-integrations org
  • Integration tests for specific plugins should be added in the test repos
  • The data pipeline using source and sink plugins should run on both mapreduce and spark engines

User Stories

Note: The same set of user stories applies to other databases: Netezza, SQLServer, Oracle, DB2 and Postgres

  • User should be able to install Mysql specific database source and sink plugins from the Hub
  • Users should have each tool tip accurately describe what each field does
  • Users should know the format for the mysql connection string by hovering over tool tip for connection string
  • Users should get field level lineage information for the mysql source and sink 
  • Users should get a performance comparable to Sqoop when ingesting data from mysql and while writing data to mysql (within ~15% of the time taken for sqoop)
  • Users should be able to setup a pipeline avoiding specifying redundant information
  • Users should get updated reference document for mysql source and sink
  • Users should be able to read all the DB types

Deliverables 

  • Source code in data integrations org
  • Performance test comparison with Sqoop
  • Integration test code 
  • Relevant documentation in the source repo and reference documentation section in plugin

Relevant links 

Plugin Type

  • Batch Source
  • Batch Sink 
  • Real-time Source
  • Real-time Sink
  • Action
  • Post-Run Action
  • Aggregate
  • Join
  • Spark Model
  • Spark Compute

Configurables

This section defines properties that are configurable for this plugin. 

User Facing NameTypeDescriptionConstraints








Design / Implementation Tips

  • Tip #1
  • Tip #2

Design

Approach(s)

Properties

Security

Limitation(s)

Future Work

  • Some future work – HYDRATOR-99999
  • Another future work – HYDRATOR-99999

Test Case(s)

  • Test case #1
  • Test case #2

Sample Pipeline

Please attach one or more sample pipeline(s) and associated data. 

Pipeline #1

Pipeline #2



Table of Contents

Checklist

  • User stories documented 
  • User stories reviewed 
  • Design documented 
  • Design reviewed 
  • Feature merged 
  • Examples and guides 
  • Integration tests 
  • Documentation for feature 
  • Short video demonstrating the feature