Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Introduction

A separate database plugin to support MongoDB-specific features and configurations.

Use-Case

  • Users can choose and install MongoDB source and sink plugins.
  • Users should see MongoDB logo on plugin configuration page for better experience.
  • Users should get relevant information from the tool tip:
    • The tool tip should describe accurately what each field is used for.
  • Users should not have to specify any redundant configuration
  • Users should get field level lineage for the source and sink that is being used.
  • Reference documentation should be updated to account for the changes.
  • The source code for MongoDB database plugin should be placed in repo under data-integrations org.
  • Integration tests for MongoDB database plugin should be added in the test repo.
  • The data pipeline using source and sink plugins should run on both mapreduce and spark engines.

User Stories

  • User should be able to install MongoDB specific database source and sink plugins from the Hub
  • Users should have each tool tip accurately describe what each field does
  • Users should get field level lineage information for the MongoDB source and sink 
  • Users should be able to setup a pipeline avoiding specifying redundant information
  • Users should get updated reference document for MongoDB source and sink
  • Users should be able to read all the DB types

Plugin Type

  • Batch Source
  • Batch Sink 
  • Real-time Source
  • Real-time Sink
  • Action
  • Post-Run Action
  • Aggregate
  • Join
  • Spark Model
  • Spark Compute

Design Tips

MongoDB driver reference: http://mongodb.github.io/mongo-java-driver/3.10/driver/

Existing database plugins: https://github.com/cdapio/hydrator-plugins/tree/develop/database-plugins

Design

The suggestion is to create maven sub-module MongoDB under database-plugins repository.


Sink Properties

User Facing NameTypeDescriptionConstraints
LabelStringLabel for UI.
Reference NameStringUniquely identified name for lineage.
HostStringHost that MongoDB is running on.

Required

(defaults to localhost on UI)

PortNumberPort that MongoDB is listening to.

Optional

(default 27017)

DatabaseStringMongoDB database name.Required
CollectionStringName of the database collection to write to.Required
UsernameStringUser identity for connecting to the specified database.
PasswordPasswordPassword to use to connect to the specified database.
Connection ArgumentsKeyvalue

A list of arbitrary string key/value pairs as connection arguments. See Connection String Options for a full description of these arguments.



Source Properties

User Facing NameTypeDescriptionConstraints
LabelStringLabel for UI.
Reference NameStringUniquely identified name for lineage.
HostStringHost that MongoDB is running on.

Required

(defaults to localhost on UI)

PortNumberPort that MongoDB is listening to.

Optional

(default 27017)

DatabaseStringMongoDB database name.Required
CollectionStringName of the database collection to write to.Required
Output SchemaSchemaSpecifies the schema of the documents.Required
Input QueryStringOptionally filter the input collection with a query. This query must be represented in JSON format and use the MongoDB extended JSON format to represent non-native JSON data types.
Input FieldsStringProjection document that can limit the fields that appear in each document. This must be represented in JSON format, and use the MongoDB extended JSON format to represent non-native JSON data types. If no projection document is provided, all fields will be read.
Splitter Class

The name of the Splitter class to use. If left empty, the MongoDB Hadoop Connector will attempt to make a best-guess as to which Splitter to use.

The Hadoop connector provides these Splitters:

  • com.mongodb.hadoop.splitter.StandaloneMongoSplitter
  • com.mongodb.hadoop.splitter.ShardMongoSplitter
  • com.mongodb.hadoop.splitter.ShardChunkMongoSplitter
  • com.mongodb.hadoop.splitter.MultiMongoCollectionSplitter

UsernameStringUser identity for connecting to the specified database.
PasswordPasswordPassword to use to connect to the specified database.
Authentication Connection String
Auxiliary MongoDB connection string to authenticate against when constructing splits.
Connection ArgumentsKeyvalue

A list of arbitrary string key/value pairs as connection arguments. See Connection String Options for a full description of these arguments.



A
pproach

Create a module mongodb-plugin in database-plugins project, reuse existing database-plugins code if possible. Add MongoDB-specific properties to configuration, add support for MongoDB-specific datatypes. Update UI widgets JSON definitions.

Pipeline Samples


Releases

Release X.Y.Z

Related Work

Database plugin enhancements

  • No labels