Introduction

A separate database plugin to support PostgreSQL-specific features and configurations.

Use-Case

Users can choose and install PostgreSQL source and sink plugins.
Users should see PostgreSQL logo on plugin configuration page for better experience.
Users should get relevant information from the tool tip:
- The tool tip for the connection string should be customized specifically to the PostgreSQL database,
- The tool tip should describe accurately what each field is used for.
User should get a performance comparable to Sqoop by utilizing sqoop libraries for the data ingestion and egress.
Users should not have to specify any redundant configuration (ex: JDBC type in source plugin, columns in the sink plugin).
Users should get field level lineage for the source and sink that is being used.
Reference documentation should be updated to account for the changes.
The source code for PostgreSQL database plugin should be placed in repo under data-integrations org.
Integration tests for PostgreSQL database plugin should be added in the test repo.
The data pipeline using source and sink plugins should run on both mapreduce and spark engines.

User Stories

User should be able to install PostgreSQL specific database source and sink plugins from the Hub
Users should have each tool tip accurately describe what each field does
Users should get field level lineage information for the PostgreSQL source and sink
Users should get a performance comparable to Sqoop when ingesting data from PostgreSQL and while writing data to PostgreSQL (within ~15% of the time taken for sqoop)
Users should be able to setup a pipeline avoiding specifying redundant information
Users should get updated reference document for PostgreSQL source and sink
Users should be able to read all the DB types

Plugin Type

Batch Source
Batch Sink
Real-time Source
Real-time Sink
Action
Post-Run Action
Aggregate
Join
Spark Model
Spark Compute

Design Tips

PostgreSQL connector reference: https://jdbc.postgresql.org/download/postgresql-9.4.1211.jar

Existing database plugins: https://github.com/cdapio/hydrator-plugins/tree/develop/database-plugins

PostgreSQL datatypes mappings and conversions:

Design

The suggestion is to create maven submodule PostgreSQL under database-plugins repo.

Sink Properties

User Facing Name	Type	Description	Constraints
Label	String	Label for UI
Reference Name	String	Uniquely identified name for lineage
Host	String	PostgreSQL host	Required (defaults to localhost on UI)
Port	Number	Specific port where PostgreSQL running on	Optional (default 5432)
Database	String	Database name to connect	Required
Import Query	String	Query for import data	Valid SQL query
Username	String	DB username	Required
Password	Password	User password	Required
Bounding Query	String	Returns max and min of split-By Filed	Valid SQL query
Split-By Field Name	String	Field name which will be used to generate splits
Number of Splits to Generate	Number	Number of splits to generate
Transaction Isolation Level	Select	Transaction isolation level for queries run by this sink
Connection Arguments	Keyvalue	A list of arbitrary string tag/value pairs as connection arguments, list of properties https://jdbc.postgresql.org/documentation/head/connect.html#connection-parameters
Table Name	String	Name of a database table to write to

Source Properties

User Facing Name	Type	Description	Constraints
Label	String	Label for UI
Reference Name	String	Uniquely identified name for lineage
Host	String	PostgreSQL host	Required (defaults to localhost on UI)
Port	Number	Specific port where PostgreSQL running on	Optional (default 5432)
Database	String	Database name to connect	Required
Import Query	String	Query for import data	Valid SQL query
Username	String	DB username	Required
Password	String	User password	Required
Bounding Query	String	Returns max and min of split-By Filed	Valid SQL query
Split-By Field Name	String	Field name which will be used to generate splits
Number of Splits to Generate	Number	Number of splits to generate
Transaction Isolation Level	Select	Transaction isolation level for queries run by this sink
Connection Arguments	Keyvalue	A list of arbitrary string tag/value pairs as connection arguments, list of properties https://jdbc.postgresql.org/documentation/head/connect.html#connection-parameters

Action Properties

User Facing Name	Type	Description	Constraints
Label	String	Label for UI
Host	String	PostgreSQL host	Required (defaults to localhost on UI)
Port	Number	Specific port where PostgreSQL running on	Optional (default 5432)
Database	String	Database name to connect	Required
Username	String	DB username	Required
Password	String	User password	Required
Connection Arguments	Keyvalue	A list of arbitrary string tag/value pairs as connection arguments, list of properties https://jdbc.postgresql.org/documentation/head/connect.html#connection-parameters
Database Command	String	Database command to run	Valid SQL query

Approach

Create a module postgresql-plugin in database-plugins project, reuse existing database-plugins code if possible. Add PostgreSQL-specific properties to configuration, add support for PostgreSQL-specific datatypes. Update UI widgets JSON definitions.

Pipeline Samples

API changes

Deprecated Programmatic APIs

database-plugins is moved to Data Integrations

UI Impact or Changes

Configurable database properties are presented as named text fields instead of arbitrary key value pairs. PostgreSQL source and sink are separate entries with PostgreSQL logo in source and sink lists.

Introduction

Use-Case

User Stories

Plugin Type

Design Tips

Design

Sink Properties

Source Properties

Action Properties

Approach

Pipeline Samples

API changes

Deprecated Programmatic APIs

UI Impact or Changes

Test Scenarios

Releases

Release X.Y.Z

Related Work

Future work

PostgreSQL database plugin

Introduction

Use-Case

User Stories

Plugin Type

Design Tips

Design

Sink Properties

Source Properties

Action Properties

Approach

Pipeline Samples

API changes

Deprecated Programmatic APIs

UI Impact or Changes

Test Scenarios

Releases

Release X.Y.Z

Related Work

Future work