MariaDB database plugin
- Dmytro Grinenko
Introduction
A separate database plugin to support MariaDB-specific features and configurations
Use-cases
- Users should not have to specify any redundant configuration (ex: JDBC type in source plugin, columns in the sink plugin)
- Users should get field level lineage for the source and sink that is being used
- Reference documentation should be updated to account for the changes
- The data pipeline using source and sink plugins should run on both MapReduce and Spark engines
- Users can choose and install MariaDB source and sink plugins
- Users should see MariaDB logo on plugin configuration page for better experience
- Users should get relevant information from the tooltip
- The tooltip for the connection string should be customized specifically to the MariaDB database
- The tooltip should describe accurately what each field is used for
User Stories
Users should have each tool tip accurately describe what each field does
Users should know the format for the MariaDB connection string by hovering over tooltip for connection string
Users should get field level lineage information for the MariaDB source and sink
Users should be able to setup a pipeline avoiding specifying redundant information
Users should get updated reference document for MariaDB source and sink
Users should be able to read all the DB types
Plugin Type
- Batch Source
- Batch Sink
- Real-time Source
- Real-time Sink
- Action
- Post-Run Action
- Aggregate
- Join
- Spark Model
- Spark Compute
Design / Implementation Tips
MariaDB Connector/J reference: https://mariadb.com/kb/en/library/mariadb-connector-j/
MariaDB Connector/J repository: https://github.com/MariaDB/mariadb-connector-j
MariaDB data types mapping and conversions: https://github.com/MariaDB/mariadb-connector-j/blob/master/src/main/java/org/mariadb/jdbc/internal/ColumnType.java#L64-L92
Connecting Securely Using SSL
Configuring Connector/J client to use SSL can be accomplished by the following steps:
1) Import server certificate into the Java default trust-store (although tampering the default trust-store is not recommended) or by importing it into a custom Java trust-store file. Use trustStore property to point the driver to the trusted root certificate key-store.
2) Generate the client private key and certificate or use keys and certificate files generated by the MariaDB server. Convert the client key and certificate files to a PKCS #12 archive and import the archive into a Java keystore. Use trustStore property to point the driver to the client certificate key-store.
3) Use keyStorePassword and trustStorePassword properties to specify passwords for the client and trusted certificates key-stores.
See: https://mariadb.com/kb/en/library/using-tls-ssl-with-mariadb-java-connector/
Support of the 'Use ANSI quotes to quote identifiers' property
Should not be specified via JDBC URL parameter since it will override the default SQL_MODE system variable, instead of appending 'ANSI_QUOTES' to it's value. Proper implementation has to read default SQL_MODE, append 'ANSI_QUOTES' and update the value using "SET SESSION sql_mode = 'modes';" statement.
Design
Currently MariaDB 5.5.3 and later are supported. We suggest using MariaDB Connector/J 2.4 since it is the last stable release which supports all the new features of recent releases.
Sink Properties
Section | User Facing Name | Type | Description | Constraints |
---|---|---|---|---|
Basic | Label | String | Label for UI | Required |
Reference Name | String | Uniquely identified name for lineage | Required | |
Host | String | MariaDB host | Required (defaults to localhost on UI) | |
Port | Number | Specific port where MariaDB running on | Optional (default 3306) | |
Database | String | Database name to connect | Required | |
Table Name | String | Name of a database table to write to | Required | |
Credentials | Username | String | DB username | Required |
Password | Password | User password | Required | |
SSL | Use SSL | Toggle | Turns on SSL encryption. The connection will fail if SSL is not available | |
Keystore URL | String | URL to the client certificate KeyStore (if not specified, use defaults). Must be accessible at the same location on host where CDAP Master is running and all hosts on which at least one HDFS, MapReduce, or YARN daemon role is running | ||
Keystore password | Password | Password for the client certificates KeyStore | ||
Truststore URL | String | URL to the trusted root certificate KeyStore (if not specified, use defaults). Must be accessible at the same location on host where CDAP Master is running and all hosts on which at least one HDFS, MapReduce, or YARN daemon role is running | ||
Truststore password | Password | Password for the trusted root certificates KeyStore | ||
Advanced | Connection Arguments | Keyvalue | A list of arbitrary string tag/value pairs as connection arguments, list of properties. See https://mariadb.com/kb/en/library/about-mariadb-connector-j/#connection-strings | |
Transaction Isolation Level | Select | Transaction isolation level for queries run by this sink | ||
Use compression protocol | Toggle | Use zlib compression when communicating with the server. Select this option for WAN connections |
Source Properties
User Facing Name | Type | Description | Constraints | |
---|---|---|---|---|
Basic | Label | String | Label for UI | Required |
Reference Name | String | Uniquely identified name for lineage | Required | |
Host | String | MariaDB host | Required (defaults to localhost on UI) | |
Port | Number | Specific port where MariaDB running on | Optional (default 3306) | |
Database | String | Database name to connect | Required | |
Import Query | String | Query for import data | Valid SQL query | |
Credentials | Username | String | DB username | Required |
Password | Password | User password | Required | |
SSL | Use SSL | Toggle | Turns on SSL encryption. The connection will fail if SSL is not available | |
Keystore URL | String | URL to the client certificate KeyStore (if not specified, use defaults). Must be accessible at the same location on host where CDAP Master is running and all hosts on which at least one HDFS, MapReduce, or YARN daemon role is running | ||
Keystore password | Password | Password for the client certificates KeyStore | ||
Truststore URL | String | URL to the trusted root certificate KeyStore (if not specified, use defaults). Must be accessible at the same location on host where CDAP Master is running and all hosts on which at least one HDFS, MapReduce, or YARN daemon role is running | ||
Truststore password | Password | Password for the trusted root certificates KeyStore | ||
Advanced | Bounding Query | String | Returns max and min of Split-By Field | Valid SQL query |
Split-By Field Name | String | Field name which will be used to generate splits | ||
Number of Splits to Generate | Number | Number of splits to generate | ||
Transaction Isolation Level | Select | Transaction isolation level for queries run by this sink | ||
Connection Arguments | Keyvalue | A list of arbitrary string tag/value pairs as connection arguments, list of properties See https://mariadb.com/kb/en/library/about-mariadb-connector-j/#connection-strings | ||
Use compression protocol | Toggle | Use zlib compression when communicating with the server. Select this option for WAN connections | ||
Use ANSI quotes to quote identifiers | Toggle | Treats " as an identifier quote character and not as a string quote character |
Action Properties
User Facing Name | Type | Description | Constraints | |
---|---|---|---|---|
Basic | Label | String | Label for UI | Required |
Host | String | MariaDB host | Required (defaults to localhost on UI) | |
Port | Number | Specific port where mariaDB running on | Optional (default 3306) | |
Database | String | Database name to connect | Required | |
Database Query | String | Database command to run | Valid SQL query | |
Credentials | Username | String | DB username | Required |
Password | Password | User password | Required | |
SSL | Use SSL | Toggle | Turns on SSL encryption. The connection will fail if SSL is not available | |
Keystore URL | String | URL to the client certificate KeyStore (if not specified, use defaults). Must be accessible at the same location on host where CDAP Master is running and all hosts on which at least one HDFS, MapReduce, or YARN daemon role is running | ||
Keystore password | Password | Password for the client certificates KeyStore | ||
Truststore URL | String | URL to the trusted root certificate KeyStore (if not specified, use defaults). Must be accessible at the same location on host where CDAP Master is running and all hosts on which at least one HDFS, MapReduce, or YARN daemon role is running | ||
Truststore password | Password | Password for the trusted root certificates KeyStore | ||
Advanced | Connection Arguments | Keyvalue | A list of arbitrary string tag/value pairs as connection arguments, list of properties See https://mariadb.com/kb/en/library/about-mariadb-connector-j/#connection-strings | |
Use compression protocol | Toggle | Use zlib compression when communicating with the server. Select this option for WAN connections. | ||
Use ANSI quotes to quote identifiers | Toggle | Treats " as an identifier quote character and not as a string quote character |
Data Types Mapping
MariaDB Data Type | CDAP Schema Data Type | Support | Comment |
---|---|---|---|
TINYINT | Schema.Type.INT | + | |
BOOLEAN, BOOL | Schema.Type.BOOLEAN | + | |
SMALLINT | Schema.Type.INT | + | |
MEDIUMINT | Schema.Type.INT | + | |
INT, INTEGER | Schema.Type.INT | + | |
BIGINT | Schema.Type.LONG | + | |
DECIMAL, DEC, NUMERIC, FIXED | Schema.LogicalType.DECIMAL | + | |
FLOAT | Schema.Type.FLOAT | + | |
DOUBLE, DOUBLE PRECISION, REAL | Schema.LogicalType.DECIMAL | + | |
BIT | Schema.Type.BOOLEAN | + | |
CHAR | Schema.Type.STRING | + | |
VARCHAR | Schema.Type.STRING | + | |
BINARY | Schema.Type.BYTES | + | |
CHAR BYTE | Schema.Type.BYTES | + | |
VARBINARY | Schema.Type.BYTES | + | |
TINYBLOB | Schema.Type.BYTES | + | |
BLOB | Schema.Type.BYTES | + | |
MEDIUMBLOB | Schema.Type.BYTES | + | |
LONGBLOB | Schema.Type.BYTES | + | |
TINYTEXT | Schema.Type.STRING | + | |
TEXT | Schema.Type.STRING | + | |
MEDIUMTEXT | Schema.Type.STRING | + | |
LONGTEXT | Schema.Type.STRING | + | |
JSON | Schema.Type.STRING | + | In MariaDB it is alias to LONGTEXT |
ENUM | Schema.Type.STRING | + | No such type in java.sql.Types, mapping to String by default |
SET | Schema.Type.STRING | + | |
DATE | Schema.Type.DATE | + | |
TIME | Schema.LogicalType.TIME_MICROS | + | |
DATETIME | Schema.LogicalType.TIMESTAMP_MICROS | + | |
TIMESTAMP | Schema.LogicalType.TIMESTAMP_MICROS | + | |
YEAR | Schema.Type.DATE | + |
Approach
Create a module mariadb-plugin in database-plugins project, reuse existing database-plugins code if possible. Add MariaDB-specific properties to configuration, add support for MariaDB-specific data types. Update UI widgets JSON definitions.
UI Impact or Changes
Configurable database properties are presented as named text fields instead of arbitrary key value pairs. MariaDB source and sink are separate entries with MariaDB logo in source and sink lists.
Test Case(s)
TODO
Sample Pipeline
TODO
Table of Contents
Checklist
- User stories documented
- User stories reviewed
- Design documented
- Design reviewed
- Feature merged
- Examples and guides
- Integration tests
- Documentation for feature
- Short video demonstrating the feature