Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Introduction

A separate database plugin to support MariaDB-specific features and configurations

Use-cases


  • Users should not have to specify any redundant configuration (ex: JDBC type in source plugin, columns in the sink plugin)
  • Users should get field level lineage for the source and sink that is being used
  • Reference documentation should be updated to account for the changes
  • The data pipeline using source and sink plugins should run on both MapReduce and Spark engines
  • Users can choose and install MariaDB source and sink plugins
  • Users should see MariaDB logo on plugin configuration page for better experience
  • Users should get relevant information from the tooltip
    • The tooltip for the connection string should be customized specifically to the MariaDB database
    • The tooltip should describe accurately what each field is used for


User Stories

  • Users should have each tool tip accurately describe what each field does

  • Users should know the format for the MariaDB connection string by hovering over tooltip for connection string

  • Users should get field level lineage information for the MariaDB source and sink 

  • Users should be able to setup a pipeline avoiding specifying redundant information

  • Users should get updated reference document for MariaDB source and sink

  • Users should be able to read all the DB types

Plugin Type

  •  Batch Source
  •  Batch Sink 
  •  Real-time Source
  •  Real-time Sink
  •  Action
  •  Post-Run Action
  •  Aggregate
  •  Join
  •  Spark Model
  •  Spark Compute

Design / Implementation Tips

MariaDB Connector/J reference: https://mariadb.com/kb/en/library/mariadb-connector-j/

MariaDB Connector/J repository: https://github.com/MariaDB/mariadb-connector-j

MariaDB data types mapping and conversions: https://github.com/MariaDB/mariadb-connector-j/blob/master/src/main/java/org/mariadb/jdbc/internal/ColumnType.java#L64-L92

Connecting Securely Using SSL

Configuring Connector/J client to use SSL can be accomplished by the following steps:

1) Import server certificate into the Java default trust-store (although tampering the default trust-store is not recommended) or by importing it into a custom Java trust-store file. Use trustStore property to point the driver to the trusted root certificate key-store.

2) Generate the client private key and certificate or use keys and certificate files generated by the MariaDB server. Convert the client key and certificate files to a PKCS #12 archive and import the archive into a Java keystore. Use trustStore property to point the driver to the client certificate key-store.

3) Use keyStorePassword and trustStorePassword properties to specify passwords for the client and trusted certificates key-stores.

See: https://mariadb.com/kb/en/library/using-tls-ssl-with-mariadb-java-connector/

Support of the 'Use ANSI quotes to quote identifiers' property

Should not be specified via JDBC URL parameter since it will override the default SQL_MODE system variable, instead of appending 'ANSI_QUOTES' to it's value. Proper implementation has to read default SQL_MODE, append 'ANSI_QUOTES' and update the value using "SET SESSION sql_mode = 'modes';" statement.


Design

Currently MariaDB 5.5.3 and later are supported. We suggest using MariaDB Connector/J 2.4 since it is the last stable release which supports all the new features of recent releases.


Sink Properties


Section

User Facing Name

Type

Description

Constraints

Basic
Connection Arguments

Label

Keyvalue

A list of arbitrary string tag/value pairs as connection arguments, list of properties.

See https://mariadb.com/kb/en/library/about-mariadb-connector-j/#connection-strings

Database

String

Database name to connect

String

Label for UI

Required

Reference Name

String

Uniquely identified name for lineage

Required

Host

String

MariaDB host

Required (defaults to localhost on UI)


Port

Number

Specific port where MariaDB running on

Optional (default 3306)

Keystore password


Database

String

Database name to connect

Required


Table Name

String

Name of a database table to write to

Required





Credentials

Username

String

Password for the client certificates KeyStore.

DB username

Required

Password

Password

User password

Required





SSL

Use SSL

Toggle

Turns on SSL encryption. The connection will fail if SSL is not available



Keystore URL

String

URL to the client certificate KeyStore (if not specified, use defaults). Must be accessible at the same location on host where CDAP Master is running and all hosts on which at least one HDFS, MapReduce, or YARN daemon role is running

Label

String

Label for UI

Password

Password

User password

Required

Port

Number

Specific port where MariaDB running on

Optional

(default 3306)

Reference Name

String

Uniquely identified name for lineage

SQL_MODE

String

Override the default SQL_MODE session variable used by the server

Table Name

String

Name of a database table to write to

Transaction Isolation Level

Select

Transaction isolation level for queries run by this sink

Truststore password

String

Password for the trusted root


Keystore password

Password

Password for the client certificates KeyStore



Truststore URL

String

URL to the trusted root certificate KeyStore (if not specified, use defaults). Must be accessible at the same location on host where CDAP Master is running and all hosts on which at least one HDFS, MapReduce, or YARN daemon role is running



Truststore password

Password

Password for the trusted root certificates KeyStore







Advanced

Connection Arguments

Keyvalue

A list of arbitrary string tag/value pairs as connection arguments, list of properties.

See https://mariadb.com/kb/en/library/about-mariadb-connector-j/#connection-strings



Transaction Isolation Level

Select

Transaction isolation level for queries run by this sink



Use compression protocol

Boolean

Toggle

Use zlib compression when communicating with the server. Select this option for WAN connections

Use SSL

Select

Turns on SSL encryption. The connection will fail if SSL is not available

Username

String

DB username

Required




Source Properties



User Facing Name

Type

Description

Constraints

Basic

Label

String

Label for UI

Required

Reference Name

String

Uniquely identified name for lineage

Required

Host

String

MariaDB host

Required (defaults to localhost on UI)


Port

Number

Specific port where MariaDB running on

Optional

(default 3306)


Database

String

Database name to connect

Required


Import Query

String

Query for import data

Valid SQL query






Credentials

Username

String

DB username

Required


Password

String

A list of arbitrary string tag/value pairs as connection arguments, list of properties

See https://mariadb.com/kb/en/library/about-mariadb-connector-j/#connection-strings

Password

User password

Required

Bounding Query

String

Returns max and min of Split-By Field

Valid SQL query

Split-By Field Name

String

Field name which will be used to generate splits

Number of Splits to Generate

Number

Number of splits to generate

Transaction Isolation Level

Select

Transaction isolation level for queries run by this sink

Connection Arguments

Keyvalue






SSL

Use SSL

Select

Toggle

Turns on SSL encryption. The connection will fail if SSL is not available

.



Keystore URL

String

URL to the client certificate KeyStore (if not specified, use defaults). Must be accessible at the same location on host where CDAP Master is running and all hosts on which at least one HDFS, MapReduce, or YARN daemon role is running



Keystore password

String

Password

Password for the client certificates KeyStore



Truststore URL

String

URL to the trusted root certificate KeyStore (if not specified, use defaults). Must be accessible at the same location on host where CDAP Master is running and all hosts on which at least one HDFS, MapReduce, or YARN daemon role is running



Truststore password

String

Password

Password for the trusted root certificates KeyStore







Advanced

Bounding Query

String

Returns max and min of Split-By Field

Valid SQL query


Split-By Field Name

String

Field name which will be used to generate splits



Number of Splits to Generate

Number

Number of splits to generate



Transaction Isolation Level

Select

Transaction isolation level for queries run by this sink



Connection Arguments

Keyvalue

A list of arbitrary string tag/value pairs as connection arguments, list of properties

See https://mariadb.com/kb/en/library/about-mariadb-connector-j/#connection-strings



Use compression protocol

Boolean

Toggle

Use zlib compression when communicating with the server. Select this option for WAN connections

SQL_MODE

String

Override the default SQL_MODE session variable used by the server



Use ANSI quotes to quote identifiers

Boolean

Toggle

Treats " as an identifier quote character and not as a string quote character



Action Properties


StringValid SQL querySelect.StringString

User Facing Name

Type

Description

Constraints

Basic

Label

String

Label for UI

Required

Host

String

MariaDB host

Required (defaults to localhost on UI)


Port

Number

Specific port where mariaDB running on

Optional (default 3306)


Database

String

Database name to connect

Required


Database Query

String

Database command to run

Valid SQL query






Credentials

Username

String

DB username

Required


Password

Password

User password

Required

Connection Arguments

Keyvalue

A list of arbitrary string tag/value pairs as connection arguments, list of properties

See https://mariadb.com/kb/en/library/about-mariadb-connector-j/#connection-strings

Database Command

String

Database command to run






SSL

Use SSL

Toggle

Turns on SSL encryption. The connection will fail if SSL is not available



Keystore URL

String

URL to the client certificate KeyStore (if not specified, use defaults). Must be accessible at the same location on host where CDAP Master is running and all hosts on which at least one HDFS, MapReduce, or YARN daemon role is running



Keystore password

Password

Password for the client certificates KeyStore



Truststore URL

String

URL to the trusted root certificate KeyStore (if not specified, use defaults). Must be accessible at the same location on host where CDAP Master is running and all hosts on which at least one HDFS, MapReduce, or YARN daemon role is running



Truststore password

Password

Password for the trusted root certificates KeyStore







Advanced

Connection Arguments

Keyvalue

A list of arbitrary string tag/value pairs as connection arguments, list of properties

See https://mariadb.com/kb/en/library/about-mariadb-connector-j/#connection-strings



Use compression protocol

BooleanToggle

Use zlib compression when communicating with the server. Select this option for WAN connections.SQL_MODE

String

Override the default SQL_MODE session variable used by the server



Use ANSI quotes to quote identifiers

BooleanToggle

Treats " as an identifier quote character and not as a string quote character



Data Types Mapping

MariaDB Data TypeCDAP Schema Data TypeSupportComment
TINYINTSchema.Type.INT+
BOOLEAN, BOOLSchema.Type.BOOLEAN+
SMALLINTSchema.Type.INT+
MEDIUMINTSchema.Type.INT+
INT, INTEGERSchema.Type.INT+
BIGINTSchema.Type.LONG+
DECIMAL, DEC, NUMERIC, FIXEDSchema.LogicalType.DECIMAL+
FLOATSchema.Type.FLOAT+
DOUBLE, DOUBLE PRECISION, REALSchema.LogicalType.DECIMAL+
BITSchema.Type.BOOLEAN+
CHARSchema.Type.STRING+
VARCHARSchema.Type.STRING+
BINARYSchema.Type.BYTES+
CHAR BYTESchema.Type.BYTES+
VARBINARYSchema.Type.BYTES+
TINYBLOBSchema.Type.BYTES+
BLOBSchema.Type.BYTES+
MEDIUMBLOBSchema.Type.BYTES+
LONGBLOBSchema.Type.BYTES+
TINYTEXTSchema.Type.STRING+
TEXTSchema.Type.STRING+
MEDIUMTEXTSchema.Type.STRING+
LONGTEXTSchema.Type.STRING+
JSONSchema.Type.STRING+In MariaDB it is alias to LONGTEXT
ENUMSchema.Type.STRING+No such type in java.sql.Types, mapping to String by default
SETSchema.Type.STRING+
DATESchema.Type.DATE+
TIMESchema.LogicalType.TIME_MICROS+
DATETIMESchema.LogicalType.TIMESTAMP_MICROS+
TIMESTAMPSchema.LogicalType.TIMESTAMP_MICROS+
YEARSchema.Type.DATE+


Approach

Create a module mariadb-plugin in database-plugins project, reuse existing database-plugins code if possible. Add MariaDB-specific properties to configuration, add support for MariaDB-specific data types. Update UI widgets JSON definitions.

UI Impact or Changes

Configurable database properties are presented as named text fields instead of arbitrary key value pairs. MariaDB source and sink are separate entries with MariaDB logo in source and sink lists.

Test Case(s)

TODO

Sample Pipeline

TODO


Table of Contents

Table of Contents
stylecircle

Checklist

  •  User stories documented 
  •  User stories reviewed 
  •  Design documented 
  •  Design reviewed 
  •  Feature merged 
  •  Examples and guides 
  •  Integration tests 
  •  Documentation for feature 
  •  Short video demonstrating the feature