Amazon AuroraDB PostgreSQL plugin

Introduction

Amazon Aurora is a PostgreSQL compatible database offered as a service. Users will have needs to write to AuroraDB or read from AuroraDB.

Use-case

  • Users would like to batch build a data pipeline to read complete table from Amazon Aurora DB instance and write to BigTable. 
  • Users would like to batch build a data pipeline to perform upserts on AuroraDB tables in batch 
  • Users should get relevant information from the tool tip while configuring the AuroraDB source and AuroraDB sink
    • The tool tip for the connection string should be customized specific to the database. 
    • The tool tip should describe accurately what each field is used for
  • Users should get field level lineage for the source and sink that is being used
  • Reference documentation be available from the source and sink plugins

User Stories

  • User should be able to install AuroraDB PosgreSQL source and sink plugins from the Hub
  • Users should have each tool tip accurately describe what each field does
  • Users should get field level lineage information for the AuroraDB PostgreSQL source and sink 
  • Users should be able to setup a pipeline avoiding specifying redundant information
  • Users should get updated reference document for AuroraDB PostgreSQL source and sink
  • Users should be able to read all the DB types

Deliverables 

  • Source code in data integrations org
  • Integration test code 
  • Relevant documentation in the source repo and reference documentation section in plugin

Relevant links 

Plugin Type

  • Batch Source
  • Batch Sink 
  • Real-time Source
  • Real-time Sink
  • Action
  • Post-Run Action
  • Aggregate
  • Join
  • Spark Model
  • Spark Compute

Design / Implementation Tips

Design

  • It is suggested to place plugin code under database-plugin repository to reuse existing database capabilities.

Source Properties

User Facing NameTypeDescriptionConstraints
LabelStringLabel for UI
Reference NameStringUniquely identified name for lineageRequired
Driver NameStringName of JDBC driver to use

Required

(defaults to postgres)

Cluster endpointStringURL of the current master instance of PostgreSQL clusterRequired
PortNumberPort of PostgreSQL cluster's master instance

Optional

(defaults to 5432)
DatabaseStringDatabase name to connectRequired
Import QueryStringQuery for import dataValid SQL query
UsernameStringDB usernameRequired
PasswordStringUser passwordRequired
Bounding QueryStringReturns max and min of split-By FiledValid SQL query
Split-By Field NameStringField name which will be used to generate splits
Number of Splits to GenerateNumberNumber of splits to generate




Connection ArgumentsKeyvalue

A list of arbitrary string tag/value pairs as connection arguments, list of properties 

https://jdbc.postgresql.org/documentation/head/connect.html#connection-parameters


Sink Properties

User Facing NameTypeDescriptionConstraints
LabelStringLabel for UI
Reference NameStringUniquely identified name for lineageRequired
Driver NameStringName of JDBC driver to use

Required

(defaults to postgres)

HostStringURL of the current master instance of PostgreSQL clusterRequired
PortNumberPort of PostgreSQL cluster's master instance

Optional

(defaults to 5432)

DatabaseStringDatabase name to connectRequired
UsernameStringDB usernameRequired
PasswordPasswordUser passwordRequired
Connection ArgumentsKeyvalue

A list of arbitrary string tag/value pairs as connection arguments, list of properties

https://jdbc.postgresql.org/documentation/head/connect.html#connection-parameters


Table NameStringName of a database table to write toRequried


Test Case(s)

  • Test case #1
  • Test case #2

Sample Pipeline

Please attach one or more sample pipeline(s) and associated data. 

Pipeline #1

Pipeline #2