Introduction
A separate database plugin to support MongoDB-specific features and configurations.
Use-Case
- Users can choose and install MongoDB source and sink plugins.
- Users should see MongoDB logo on plugin configuration page for better experience.
- Users should get relevant information from the tool tip:
- The tool tip should describe accurately what each field is used for.
- Users should not have to specify any redundant configuration
- Users should get field level lineage for the source and sink that is being used.
- Reference documentation should be updated to account for the changes.
- The source code for MongoDB database plugin should be placed in repo under data-integrations org.
- Integration tests for MongoDB database plugin should be added in the test repo.
- The data pipeline using source and sink plugins should run on both mapreduce and spark engines.
User Stories
- User should be able to install MongoDB specific database source and sink plugins from the Hub
- Users should have each tool tip accurately describe what each field does
- Users should get field level lineage information for the MongoDB source and sink
- Users should be able to setup a pipeline avoiding specifying redundant information
- Users should get updated reference document for MongoDB source and sink
- Users should be able to read all the DB types
Plugin Type
- Batch Source
- Batch Sink
- Real-time Source
- Real-time Sink
- Action
- Post-Run Action
- Aggregate
- Join
- Spark Model
- Spark Compute
Design Tips
MongoDB driver reference: http://mongodb.github.io/mongo-java-driver/3.10/driver/
Existing database plugins: https://github.com/cdapio/hydrator-plugins/tree/develop/database-plugins
Design
The suggestion is to create maven sub-module MongoDB under database-plugins repository.
Sink Properties
User Facing Name | Type | Description | Constraints |
---|---|---|---|
Label | String | Label for UI. | |
Reference Name | String | Uniquely identified name for lineage. | |
Host | String | Host that MongoDB is running on. | Required (defaults to localhost on UI) |
Port | Number | Port that MongoDB is listening to. | Optional (default 27017) |
Database | String | MongoDB database name. | Required |
Collection | String | Name of the database collection to write to. | Required |
Username | String | User identity for connecting to the specified database. | |
Password | Password | Password to use to connect to the specified database. | |
Connection Arguments | Keyvalue | A list of arbitrary string key/value pairs as connection arguments. See Connection String Options for a full description of these arguments. |
Source Properties
User Facing Name | Type | Description | Constraints |
---|---|---|---|
Label | String | Label for UI. | |
Reference Name | String | Uniquely identified name for lineage. | |
Host | String | Host that MongoDB is running on. | Required (defaults to localhost on UI) |
Port | Number | Port that MongoDB is listening to. | Optional (default 27017) |
Database | String | MongoDB database name. | Required |
Collection | String | Name of the database collection to write to. | Required |
Output Schema | Schema | Specifies the schema of the documents. | Required |
Input Query | String | Optionally filter the input collection with a query. This query must be represented in JSON format and use the MongoDB extended JSON format to represent non-native JSON data types. | |
Input Fields | String | Projection document that can limit the fields that appear in each document. This must be represented in JSON format, and use the MongoDB extended JSON format to represent non-native JSON data types. If no projection document is provided, all fields will be read. | |
Splitter Class | The name of the Splitter class to use. If left empty, the MongoDB Hadoop Connector will attempt to make a best-guess as to which Splitter to use. The Hadoop connector provides these Splitters:
| ||
Username | String | User identity for connecting to the specified database. | |
Password | Password | Password to use to connect to the specified database. | |
Authentication Connection String | Auxiliary MongoDB connection string to authenticate against when constructing splits. | ||
Connection Arguments | Keyvalue | A list of arbitrary string key/value pairs as connection arguments. See Connection String Options for a full description of these arguments. |
Approach
Create a module mongodb-plugin in database-plugins project, reuse existing database-plugins code if possible. Add MongoDB-specific properties to configuration, add support for MongoDB-specific datatypes. Update UI widgets JSON definitions.