Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Introduction

A separate database plugin to support MongoDB-specific features and configurations.

...

The following example uses '{ status: { $in: [ "A", "D" ] } }' query filter document to retrieve all documents from the 'inventory' collection where 'status' equals either "A" or "D":

...

The following MongoDB data types are missing: Undefined, Regular Expression, DBPointer, JavaScript, Symbol, JavaScript (with scope), Timestamp, Min key, Max key.

CDAP Schema Data TypeMongoDB Data TypesComment
booleanBoolean
bytesBinary data, ObjectId(if 'ID Field' specified)
dateDate

doubleDouble
decimalDecimal128The Decimal128 type supports up to 34 digits of precision.
floatDouble

int32-bit integer
long64-bit integer
stringString, ObjectId(if 'ID Field' specified)
timeString
timestampDate
arrayArray
record

Object


enum
Array
String
map

Object


union

Depends on the actual value.

For example, if it's a union:

Code Block
["string","int","long"]

and the value is actually a long, the mongo document will have the field as a 64-bit integer. If a different record comes in with the value as a string, the mongo document will end up with a String for that field.



Source Properties

User Facing NameWidget TypeDescriptionConstraints
LabeltextboxLabel for UI.
Reference NametextboxUniquely identified name for lineage.
HosttextboxHost that MongoDB is running on.

Required

(defaults to localhost on UI)

PortnumberPort that MongoDB is listening to.

Optional

(default 27017)

DatabasetextboxMongoDB database name.Required
CollectiontextboxName of the database collection to write to.Required
Output SchemaschemaSpecifies the schema of the documents.Required
On Record Errorselectradio-groupSpecifies how to handle error in record processing. An error will be thrown if failed to parse value according to a provided schema.

Possible values are:

  • Skip error
  • Fail pipelineWrite to error dataset

Default: 'Fail pipeline'

Input Queryjson-editorOptionally filter the input collection with a query. This query must be represented in JSON format and use the MongoDB extended JSON format to represent non-native JSON data types.
UsernametextboxUser identity for connecting to the specified database.
PasswordpasswordPassword to use to connect to the specified database.
Authentication Connection StringtextboxAuxiliary MongoDB connection string to authenticate against when constructing splits.
Connection Argumentskeyvalue

A list of arbitrary string key/value pairs as connection arguments. See Connection String Options for a full description of these arguments.


...

The source requires Output Schema to be set. Based on the schema source will expect a field in each document to be of a specific Mongo data type.

On Record Error error handling property allows the user to decide whether the pipeline should fail, the record should be skipped, or the record should be sent to the error dataset.

...

CDAP Schema Data TypeMongoDB Data Types
booleanBoolean
bytesBinary data, ObjectId
date-
doubleDouble
decimalDecimal128
float-
int32-bit integer
long64-bit integer
string

String, Symbol

time-
timestampDate
arrayArray
record

Object

The following schema:

Code Block
{"type":"record","name":"object","fields":[{"name":"inner_field","type":"string"}]}

is used for 'object' field:

Code Block
{
 "object" : {
        "inner_field" : "val"
    }
}


* We can map all non-standard data types to record, like JavaScript (with scope) in the example below.

The following schema:

Code Block
{
  "type":"record",
  "name":"javascriptwithscope",
  "fields":[
    {"name":"$code","type":"string"},
    {"name":"$scope","type":{"type":"record","name":"scope-object-record","fields"[{"name":"scope","type":"string"}]}}
  ]
}

is used for 'javascriptwithscope' field:

Code Block
{
  "javascriptwithscope" : { "$code" : var l = 1; ,  "$scope" : { "scope" : "scope_val" } }
}
enum-
map

Object

The following schema:

Code Block
{"type":"map","keys":"string","values":"string"}

is used for 'object' field:

Code Block
{
 "object" : {
        "inner_field" : "val"
    }
}


union

-

...