MySQL database plugin

Introduction

A separate database plugin to support MySQL-specific features and configurations.

Use-Case

  • Users can choose and install MySQL source and sink plugins.
  • Users should see MySQL logo on plugin configuration page for better experience.
  • Users should get relevant information from the tool tip:
    • The tool tip for the connection string should be customized specifically to the MySQL database,
    • The tool tip should describe accurately what each field is used for.
  • User should get a performance comparable to Sqoop by utilizing sqoop libraries for the data ingestion and egress.
  • Users should not have to specify any redundant configuration (ex: JDBC type in source plugin, columns in the sink plugin).
  • Users should get field level lineage for the source and sink that is being used.
  • Reference documentation should be updated to account for the changes.
  • The source code for MySQL database plugin should be placed in repo under data-integrations org.
  • Integration tests for MySQL database plugin should be added in the test repo.
  • The data pipeline using source and sink plugins should run on both mapreduce and spark engines.

User Stories

  • User should be able to install MySQL specific database source and sink plugins from the Hub
  • Users should have each tool tip accurately describe what each field does
  • Users should know the format for the MySql connection string by hovering over tool tip for connection string
  • Users should get field level lineage information for the MySQL source and sink 
  • Users should get a performance comparable to Sqoop when ingesting data from mysql and while writing data to MySQL (within ~15% of the time taken for sqoop)
  • Users should be able to setup a pipeline avoiding specifying redundant information
  • Users should get updated reference document for MySQL source and sink
  • Users should be able to read all the DB types

Plugin Type

  • Batch Source
  • Batch Sink 
  • Real-time Source
  • Real-time Sink
  • Action
  • Post-Run Action
  • Aggregate
  • Join
  • Spark Model
  • Spark Compute

Design Tips

MySQL Connector/J 8.0 reference: https://dev.mysql.com/doc/connector-j/8.0/en/

Existing database plugins: https://github.com/cdapio/hydrator-plugins/tree/develop/database-plugins

MySQL datatypes mappings and conversions: https://dev.mysql.com/doc/connector-j/8.0/en/connector-j-reference-type-conversions.html


Connecting Securely Using SSL
Configuring Connector/J client to use SSL can be accomplished by the following steps:
1) Import server certificate into the Java default truststore (although tampering the default truststore is not recommended) or by importing it into a custom Java truststore file. Use 'trustCertificateKeyStoreUrl' property to point the driver to the trusted root certificate keystore.
2) Generate the client private key and certificate or use keys and certificate files generated by the MySQL server. Convert the client key and certificate files to a PKCS #12 archive and import the archive into a Java keystore. Use 'clientCertificateKeyStoreUrl' property to point the driver to the client certificate keystore.
3) Use 'clientCertificateKeyStorePassword' and 'trustCertificateKeyStorePassword' properties to specify passwords for the client and trusted certificates keystores.

See:
https://dev.mysql.com/doc/connector-j/8.0/en/connector-j-reference-configuration-properties.html
https://dev.mysql.com/doc/refman/8.0/en/encrypted-connections.html
https://dev.mysql.com/doc/connector-j/8.0/en/connector-j-reference-using-ssl.html

Support of the 'Use ANSI quotes to quote identifiers' property
Should not be specified via JDBC URL parameter since it will override the default SQL_MODE system variable, instead of appending 'ANSI_QUOTES' to it's value. Proper implementation has to read default SQL_MODE(which depends on the version of the MySQL Server), append 'ANSI_QUOTES' and update the value using "SET SESSION sql_mode = 'modes';" statement.

Design

Currently the two major MySQL versions supported are 5 and 8. We suggest using MySQL Connector/J 8.0 since it is backward compatible with older versions of MySQL and supports all the new features of recent releases.

The suggestion is to move database-plugins module from hydrator-plugins repository to database-plugins repo in data-integrations organization as described in Plugins Repo Split. There is existing code in database-plugins that may be reused for MySQL plugin. We suggest creating a multi-module Maven project where existing `database-plugins` will be a common functionality module for all subsequent DB plugins and each plugin for a specific database (in this case MySQL) will depend on it. Having each DB plugin in a dedicated module allows us to create separately deliverable artifacts, so user can upload only those plugins they need.

Sink Properties

User Facing NameTypeDescriptionConstraints
LabelString Label for UI
Reference NameStringUniquely identified name for lineage
HostStringMysql hostRequired (defaults to localhost on UI)
PortNumberSpecific port where mysql running on

Optional

(default 3306)

DatabaseStringDatabase name to connectRequired
UsernameStringDB usernameRequired
PasswordPasswordUser passwordRequired
Transaction Isolation LevelSelectTransaction isolation level for queries run by this sink
Connection ArgumentsKeyvalue

A list of arbitrary string tag/value pairs as connection arguments, list of properties

https://dev.mysql.com/doc/connector-j/8.0/en/connector-j-reference-configuration-properties.html


Table NameStringName of a database table to write to
Use SSLSelectTurns on SSL encryption. The connection will fail if SSL is not available.
Keystore URLStringURL to the client certificate KeyStore (if not specified, use defaults). Must be accessible at the same location on host where CDAP Master is running and all hosts on which at least one HDFS, MapReduce, or YARN daemon role is running.

Keystore password

StringPassword for the client certificates KeyStore.
Truststore URLStringURL to the trusted root certificate KeyStore (if not specified, use defaults). Must be accessible at the same location on host where CDAP Master is running and all hosts on which at least one HDFS, MapReduce, or YARN daemon role is running.

Truststore password

StringPassword for the trusted root certificates KeyStore
Use compression protocolBooleanUse zlib compression when communicating with the server. Select this option for WAN connections.
SQL_MODEStringOverride the default SQL_MODE session variable used by the server.

Source Properties


User Facing NameTypeDescriptionConstraints
LabelStringLabel for UI
Reference NameStringUniquely identified name for lineage
HostStringMysql hostRequired (defaults to localhost on UI)
PortNumberSpecific port where mysql running on

Optional

(default 3306)
DatabaseStringDatabase name to connectRequired
Import QueryStringQuery for import dataValid SQL query
UsernameStringDB usernameRequired
PasswordStringUser passwordRequired
Bounding QueryStringReturns max and min of split-By FiledValid SQL query
Split-By Field NameStringField name which will be used to generate splits
Number of Splits to GenerateNumberNumber of splits to generate
Transaction Isolation LevelSelectTransaction isolation level for queries run by this sink
Connection ArgumentsKeyvalueA list of arbitrary string tag/value pairs as connection arguments, list of properties https://dev.mysql.com/doc/connector-j/8.0/en/connector-j-reference-configuration-properties.html
Use SSLSelectTurns on SSL encryption. The connection will fail if SSL is not available.
Keystore URLStringURL to the client certificate KeyStore (if not specified, use defaults). Must be accessible at the same location on host where CDAP Master is running and all hosts on which at least one HDFS, MapReduce, or YARN daemon role is running.

Keystore password

StringPassword for the client certificates KeyStore.
Truststore URLStringURL to the trusted root certificate KeyStore (if not specified, use defaults). Must be accessible at the same location on host where CDAP Master is running and all hosts on which at least one HDFS, MapReduce, or YARN daemon role is running.

Truststore password

StringPassword for the trusted root certificates KeyStore
Use compression protocolBooleanUse zlib compression when communicating with the server. Select this option for WAN connections.
SQL_MODEStringOverride the default SQL_MODE session variable used by the server.

Use ANSI quotes to quote identifiers

BooleanTreats " as an identifier quote character and not as a string quote character.


Action Properties


User Facing NameTypeDescriptionConstraints
LabelStringLabel for UI
HostStringMysql hostRequired (defaults to localhost on UI)
PortNumberSpecific port where mysql running on

Optional

(default 3306)
DatabaseStringDatabase name to connectRequired
Username

String

DB usernameRequired
PasswordStringUser passwordRequired
Connection ArgumentsKeyvalueA list of arbitrary string tag/value pairs as connection arguments, list of properties https://dev.mysql.com/doc/connector-j/8.0/en/connector-j-reference-configuration-properties.html
Database CommandStringDatabase command to runValid SQL query
Use SSLSelectTurns on SSL encryption. The connection will fail if SSL is not available.
Keystore URLStringURL to the client certificate KeyStore (if not specified, use defaults). Must be accessible at the same location on host where CDAP Master is running and all hosts on which at least one HDFS, MapReduce, or YARN daemon role is running.

Keystore password

StringPassword for the client certificates KeyStore.
Truststore URLStringURL to the trusted root certificate KeyStore (if not specified, use defaults). Must be accessible at the same location on host where CDAP Master is running and all hosts on which at least one HDFS, MapReduce, or YARN daemon role is running.

Truststore password

StringPassword for the trusted root certificates KeyStore
Use compression protocolBooleanUse zlib compression when communicating with the server. Select this option for WAN connections.
SQL_MODEStringOverride the default SQL_MODE session variable used by the server.

Use ANSI quotes to quote identifiers

BooleanTreats " as an identifier quote character and not as a string quote character.


Data Types Mapping

MySQL Data TypeCDAP Schema Data TypeSupportComment
BITSchema.Type.BOOLEAN+
TINYINTSchema.Type.INT+
BOOL, BOOLEANSchema.Type.BOOLEAN+
SMALLINTSchema.Type.INT+
MEDIUMINTSchema.Type.INT+
INT,INTEGERSchema.Type.INT+
BIGINTSchema.Type.LONG+
FLOATSchema.Type.FLOAT+
DOUBLESchema.Type.DOUBLE+
DECIMALSchema.LogicalType.DECIMAL+
DATESchema.Type.DATE+
DATETIMESchema.LogicalType.TIMESTAMP_MICROS+
TIMESTAMPSchema.LogicalType.TIMESTAMP_MICROS+
TIMESchema.LogicalType.TIME_MICROS+
YEARSchema.Type.DATE+
CHARSchema.Type.STRING+
VARCHARSchema.Type.STRING+
BINARYSchema.Type.BYTES+
VARBINARYSchema.Type.BYTES+
TINYBLOBSchema.Type.BYTES+
TINYTEXTSchema.Type.STRING+
BLOBSchema.Type.BYTES+
TEXTSchema.Type.STRING+
MEDIUMBLOBSchema.Type.BYTES+
MEDIUMTEXTSchema.Type.STRING+
LONGBLOBSchema.Type.BYTES+
LONGTEXTSchema.Type.STRING+
ENUMSchema.Type.STRING*No such type in java.sql.Types, mapping to String by default
SETSchema.Type.STRING+


Approach

Create a module mysql-plugin in database-plugins project, reuse existing database-plugins code if possible. Add MySQL-specific properties to configuration, add support for MySQL-specific datatypes. Update UI widgets JSON definitions.

Pipeline Samples

CampaignPipeline-cdap-data-pipeline.json


API changes

Deprecated Programmatic APIs

database-plugins is moved to Data Integrations

UI Impact or Changes

Configurable database properties are presented as named text fields instead of arbitrary key value pairs. MySQL source and sink are separate entries with MySQL logo in source and sink lists.

Test Scenarios

TODO

Releases

Release X.Y.Z

Related Work

Database plugin enhancements

Future work

PostgreSQL database plugin