Introduction
A separate database plugin to support MongoDB-specific features and configurations.
...
- Users can choose and install MongoDB source and sink plugins.
- Users should see MongoDB logo on plugin configuration page for better experience.
- Users should get relevant information from the tool tip:
- The tool tip should describe accurately what each field is used for.
- Users should not have to specify any redundant configuration
- Users should get field level lineage for the source and sink that is being used.
- Reference documentation should be updated to account for the changes.
- The source code for MongoDB database plugin should be placed in repo under data-integrations org.Integration tests for MongoDB database plugin should be added in the test repo.
- The data pipeline using source and sink plugins should run on both mapreduce and spark engines.
...
MongoDB driver reference: http://mongodb.github.io/mongo-java-driver/3.10/driver/Existing database plugins: https://github.com/cdapio/hydrator-plugins/tree/develop/database-plugins
Design
The suggestion is to create maven sub-module MongoDB under database-plugins repositorymove existing mongodb-plugins module to the mongodb-plugins repository.
Sink Properties
User Facing Name | Type | Description | Constraints |
---|---|---|---|
Label | String | Label for UI. | |
Reference Name | String | Uniquely identified name for lineage. | |
Host | String | Host that MongoDB is running on. | Required (defaults to localhost on UI) |
Port | Number | Port that MongoDB is listening to. | Optional (default 27017) |
Database | String | MongoDB database name. | Required |
Collection | String | Name of the database collection to write to. | Required |
Username | String | User identity for connecting to the specified database. | |
Password | Password | Password to use to connect to the specified database. | |
Connection Arguments | Keyvalue | A list of arbitrary string key/value pairs as connection arguments. See Connection String Options for a full description of these arguments. |
Source Properties
User Facing Name | Type | Description | Constraints |
---|---|---|---|
Label | String | Label for UI. | |
Reference Name | String | Uniquely identified name for lineage. | |
Host | String | Host that MongoDB is running on. | Required (defaults to localhost on UI) |
Port | Number | Port that MongoDB is listening to. | Optional (default 27017) |
Database | String | MongoDB database name. | Required |
Collection | String | Name of the database collection to write to. | Required |
Output Schema | Schema | Specifies the schema of the documents. | Required |
Input Query | String | Optionally filter the input collection with a query. This query must be represented in JSON format and use the MongoDB extended JSON format to represent non-native JSON data types. | |
Input Fields | String | Projection document that can limit the fields that appear in each document. This must be represented in JSON format, and use the MongoDB extended JSON format to represent non-native JSON data types. If no projection document is provided, all fields will be read. | |
Splitter Class | The name of the Splitter class to use. If left empty, the MongoDB Hadoop Connector will attempt to make a best-guess as to which Splitter to use. The Hadoop connector provides these Splitters:
| ||
Username | String | User identity for connecting to the specified database. | |
Password | Password | Password to use to connect to the specified database. | |
Authentication Connection String | Auxiliary MongoDB connection string to authenticate against when constructing splits. | ||
Connection Arguments | Keyvalue | A list of arbitrary string key/value pairs as connection arguments. See Connection String Options for a full description of these arguments. |
Data Types Mapping
MongoDB Data Type | CDAP Schema Data Type | Support | Comment |
---|---|---|---|
Double | Schema.Type.DOUBLE | + | |
String | Schema.Type.STRING | + | |
Object | Schema.Type.RECORD | + | |
Array | Schema.Type.ARRAY | + | |
Binary data | Schema.Type.BYTES | * | Value can be mapped to Schema.Type.BYTES, but this can lead to subtype information loss.
There are several options: 1) Support only 'generic' subtype. 2) Map using MongoDB extended JSON format: "binary": {"$binary": "YmluYXJ5IGRhdGE=", "$type": "00"} |
Undefined | Schema.Type.NULL | * | Can be mapped to Schema.Type.STRING using MongoDB extended JSON format: "undefined": {"$undefined": true} |
ObjectId | * | Value can be mapped to Schema.Type.STRING, but this will lead to type information loss. There are several options: 1) Do not support this data type for the Sink 2) Map using MongoDB extended JSON format: {"$oid": "5d3f1c2a2f547625b0bbb397"} | |
Boolean | Schema.Type.BOOLEAN | + | |
Date | Schema.LogicalType.TIMESTAMP_MILLIS | + | |
Null | Schema.Type.UNION | + | A |
...
nullable version of the actual type, corresponds to Schema.nullableOf(actualTypeSchema). | |||
Regular Expression | Schema.Type.STRING | * | Value can be mapped to Schema.Type.STRING, but this will lead to type information loss. There are several options: 1) Do not support this data type for the Sink 2) Map using MongoDB extended JSON format: "regex": {"$regex": ".", "$options": ""} |
DBPointer | Schema.Type.STRING | * | String in MongoDB extended JSON format: "dbpointer": {"$ref": "source", "$id": {"$oid": "5d079ee6d078c94008e4bb3a"}} |
JavaScript | Schema.Type.STRING | * | Value can be mapped to Schema.Type.STRING, but this will lead to type information loss. There are several options: 1) Do not support this data type for the Sink 2) Map using MongoDB extended JSON format: "javascript": {"$code": "var l = 1;"} |
Symbol | Schema.Type.STRING | * | Value can be mapped to Schema.Type.STRING, but this will lead to type information loss. There are several options: 1) Do not support this data type for the Sink 2) Map using MongoDB extended JSON format: "symbol": {"$symbol": "a"} |
JavaScript (with scope) | Schema.Type.STRING | * | Can be mapped to Schema.Type.STRING using MongoDB extended JSON format: "javascriptwithscope": {"$code": "var l = 1;", "$scope": {"scope": "scope_val"} |
32-bit integer | Schema.Type.INT | + | |
Timestamp | * | Special type for internal MongoDB use which is not associated with the regular Date type. Timestamp values are a 64 bit value where:
Can be mapped to Schema.Type.STRING using MongoDB extended JSON format: "timestamp": {"$timestamp": {"t": 1564410161, "i": 1}} | |
64-bit integer | Schema.Type.LONG | + | |
Decimal128 | Schema.LogicalType.DECIMAL | + | |
Min key | * | Is less than any other value of any type. This can be useful for always returning certain documents first (or last). Can be mapped to Schema.Type.STRING using MongoDB extended JSON format: "minkey": {"$minKey": 1} | |
Max key | * | Is greater than any other value of any type. This can be useful for always returning certain documents first (or last). Can be mapped to Schema.Type.STRING using MongoDB extended JSON format: "maxkey": {"$maxKey": 1} |
Approach
Move existing mongodb-plugins module to the mongodb-plugins project. Add MongoDB-specific properties to configuration, add support for MongoDB-specific datatypes. Update UI widgets JSON definitions.
...