Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Introduction

...

Code Block
{
    "_id" : ObjectId("5d3f1c2a2f547625b0bbb397"),
    "string" : "AAPL",
    "int32" : 10,
    "double" : 23.23,
    "array" : [ 
        "a1", 
        "a2"
    ],
    "object" : {
        "inner_field" : "val"
    },
    "binary" : { "$binary" : "YmluYXJ5IGRhdGE=", "$type" : "00" },
    "undefined" : undefined,
    "boolean" : false,
    "date" : ISODate("2019-07-29T16:17:46.109Z"),
    "null" : null,
    "regex" : /./,
    "dbpointer" : DBRef("source", "5d079ee6d078c94008e4bb3a"),
    "javascript" : var l = 1;,
    "javascriptwithscope" : { "$code" : var l = 1; ,  "$scope" : { "scope" : "scope_val" } },
    "symbol" : "a",
    "timestamp" : Timestamp(1564417066, 1),
    "long" : NumberLong(9223372036854775807),
    "decimal" : NumberDecimal("3.100000"),
    "minkey" : { "$minKey" : 1 },
    "maxkey" : { "$maxKey" : 1 }
}

BSON

BSON is a binary serialization format used to store documents and make remote procedure calls in MongoDB. The BSON specification is located at bsonspec.org

Document limitations

  • The maximum BSON document size is 16 megabytes.
  • In MongoDB, each document stored in a collection requires a unique _id field that acts as a primary key. If an inserted document omits the _id field, the MongoDB driver automatically generates an ObjectId for the _id field.

...

The following example uses '{ status: { $in: [ "A", "D" ] } }' query filter document to retrieve all documents from the 'inventory' collection where 'status' equals either "A" or "D":

...

MongoDB Data TypeCDAP Schema Data TypeSupportComment
Double
Schema.Type.DOUBLE+
StringSchema.Type.STRING+
ObjectSchema.Type.RECORD+
ArraySchema.Type.ARRAY+
Binary dataSchema.Type.BYTES*

Value can be mapped to Schema.Type.BYTES, but this can lead to subtype information loss.

Subtypes:


  • generic: \x00 (0)
  • function: \x01 (1)
  • old: \x02 (2)
  • uuid_old: \x03 (3)
  • uuid: \x04 (4)
  • md5: \x05 (5)
  • user: \x80 (128)


There are several options:

1) Support only 'generic' subtype.

2) Map using MongoDB extended JSON format:

"binary": {"$binary": "YmluYXJ5IGRhdGE=", "$type": "00"}

UndefinedSchema.Type.NULL*

Can be mapped to Schema.Type.STRING using MongoDB extended JSON format:

"undefined": {"$undefined": true}

ObjectId
*

Value can be mapped to Schema.Type.STRING, but this will lead to type information loss.

There are several options:

1) Do not support this data type for the Sink

2) Map using MongoDB extended JSON format: {"$oid": "5d3f1c2a2f547625b0bbb397"}

BooleanSchema.Type.BOOLEAN+
DateSchema.LogicalType.TIMESTAMP_MILLIS+
NullSchema.Type.UNION+A nullable version of the actual type, corresponds to Schema.nullableOf(actualTypeSchema).
Regular ExpressionSchema.Type.STRING*

Value can be mapped to Schema.Type.STRING, but this will lead to type information loss.

There are several options:

1) Do not support this data type for the Sink

2) Map using MongoDB extended JSON format: "regex": {"$regex": ".", "$options": ""}

DBPointerSchema.Type.STRING*

String in MongoDB extended JSON format:

"dbpointer": {"$ref": "source", "$id": {"$oid": "5d079ee6d078c94008e4bb3a"}}

JavaScriptSchema.Type.STRING*

Value can be mapped to Schema.Type.STRING, but this will lead to type information loss.

There are several options:

1) Do not support this data type for the Sink

2) Map using MongoDB extended JSON format: "javascript": {"$code": "var l = 1;"}

SymbolSchema.Type.STRING*

Value can be mapped to Schema.Type.STRING, but this will lead to type information loss.

There are several options:

1) Do not support this data type for the Sink

2) Map using MongoDB extended JSON format: "symbol": {"$symbol": "a"}

JavaScript (with scope)Schema.Type.STRING*

Can be mapped to Schema.Type.STRING using MongoDB extended JSON format:

"javascriptwithscope": {"$code": "var l = 1;", "$scope": {"scope": "scope_val"}

32-bit integerSchema.Type.INT+
Timestamp
*

Special type for internal MongoDB use which is not associated with the regular Date type. Timestamp values are a 64 bit value where:

  • the first 32 bits are a time_t value (seconds since the Unix epoch)
  • the second 32 bits are an incrementing ordinal for operations within a given second.

Can be mapped to Schema.Type.STRING using MongoDB extended JSON format:

"timestamp": {"$timestamp": {"t": 1564410161, "i": 1}}

64-bit integerSchema.Type.LONG+
Decimal128Schema.LogicalType.DECIMAL+
Min key
*

Is less than any other value of any type. This can be useful for always returning certain documents first (or last).

Can be mapped to Schema.Type.STRING using MongoDB extended JSON format:

"minkey": {"$minKey": 1}

Max key
*

Is greater than any other value of any type. This can be useful for always returning certain documents first (or last).

Can be mapped to Schema.Type.STRING using MongoDB extended JSON format:

"maxkey": {"$maxKey": 1}

...