Introduction
Amazon Aurora is a PostgreSQL compatible database offered as a service. Users will have needs to write to AuroraDB or read from AuroraDB.
Use-case
- Users would like to batch build a data pipeline to read complete table from Amazon Aurora DB instance and write to BigTable.
- Users would like to batch build a data pipeline to perform upserts on AuroraDB tables in batch
- Users should get relevant information from the tool tip while configuring the AuroraDB source and AuroraDB sink
- The tool tip for the connection string should be customized specific to the database.
- The tool tip should describe accurately what each field is used for
- Users should get field level lineage for the source and sink that is being used
- Reference documentation be available from the source and sink plugins
User Stories
- User should be able to install AuroraDB PosgreSQL source and sink plugins from the Hub
- Users should have each tool tip accurately describe what each field does
- Users should get field level lineage information for the AuroraDB PostgreSQL source and sink
- Users should be able to setup a pipeline avoiding specifying redundant information
- Users should get updated reference document for AuroraDB PostgreSQL source and sink
- Users should be able to read all the DB types
Deliverables
- Source code in data integrations org
- Integration test code
- Relevant documentation in the source repo and reference documentation section in plugin
Relevant links
- Data-integrations org: https://github.com/data-integrations/
- Field level lineage: https://docs.cdap.io/cdap/6.0.0-SNAPSHOT/en/developer-manual/metadata/field-lineage.html
- Integration test repos: https://github.com/caskdata/cdap-integration-tests
Plugin Type
- Batch Source
- Batch Sink
- Real-time Source
- Real-time Sink
- Action
- Post-Run Action
- Aggregate
- Join
- Spark Model
- Spark Compute
Design / Implementation Tips
- Reuse database-commons module from database-plugins repo.
Design
- It is suggested to place plugin code under database-plugin repository to reuse existing database capabilities.
Source Properties
User Facing Name | Type | Description | Constraints |
---|---|---|---|
Label | String | Label for UI | |
Reference Name | String | Uniquely identified name for lineage | Required |
Driver Name | String | Name of JDBC driver to use | Required (defaults to postgres) |
Cluster endpoint | String | URL of the current master instance of PostgreSQL cluster | Required |
Port | Number | Port of PostgreSQL cluster's master instance | Optional (defaults to 5432) |
Database | String | Database name to connect | Required |
Import Query | String | Query for import data | Valid SQL query |
Username | String | DB username | Required |
Password | String | User password | Required |
Bounding Query | String | Returns max and min of split-By Filed | Valid SQL query |
Split-By Field Name | String | Field name which will be used to generate splits | |
Number of Splits to Generate | Number | Number of splits to generate | |
Connection Arguments | Keyvalue | A list of arbitrary string tag/value pairs as connection arguments, list of properties https://jdbc.postgresql.org/documentation/head/connect.html#connection-parameters |
Sink Properties
User Facing Name | Type | Description | Constraints |
---|---|---|---|
Label | String | Label for UI | |
Reference Name | String | Uniquely identified name for lineage | Required |
Driver Name | String | Name of JDBC driver to use | Required (defaults to postgres) |
Host | String | URL of the current master instance of PostgreSQL cluster | Required |
Port | Number | Port of PostgreSQL cluster's master instance | Optional (defaults to 5432) |
Database | String | Database name to connect | Required |
Username | String | DB username | Required |
Password | Password | User password | Required |
Connection Arguments | Keyvalue | A list of arbitrary string tag/value pairs as connection arguments, list of properties https://jdbc.postgresql.org/documentation/head/connect.html#connection-parameters | |
Table Name | String | Name of a database table to write to | Requried |
Future Work
Test Case(s)
- Test case #1
- Test case #2
Sample Pipeline
Please attach one or more sample pipeline(s) and associated data.