Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Added Source and Sink properties

Introduction

Google Cloud Datastore is a NoSQL document database built for automatic scaling, high performance, and ease of application development offered by Google on the Google Cloud Platform. Cloud Datastore is built upon Google's Bigtable and Megastore technology.

Use case(s)

  • Users would like to batch build a data pipeline to read complete table from Google Cloud Datastore instance.
  • Users would like to batch build a data pipeline to perform inserts / upserts into Google Cloud Datastore tables in batch 
  • Users should get relevant information from the tool tip while configuring the Google Cloud Datastore source and Google Cloud Datastore sink
    • The tool tip should describe accurately what each field is used for
  • Users should get field level lineage for the source and sink that is being used
  • Reference documentation be available from the source and sink plugins

User Storie(s)

  • Source code in data integrations org
  • Integration test code 
  • Relevant documentation in the source repo and reference documentation section in plugin

Plugin Type

  •  Batch Source
  •  Batch Sink 
  •  Real-time Source
  •  Real-time Sink
  •  Action
  •  Post-Run Action
  •  Aggregate
  •  Join
  •  Spark Model
  •  Spark Compute

Configurables

This section defines properties that are configurable for this plugin. 

Design

Properties

Source

User Facing NameTypeDescriptionConstraints
Project IDString

Google Cloud Project ID, which uniquely identifies a project. It can be found on the Dashboard in the Google Cloud Platform Console.

Required.
JSON key file pathString

The credential JSON key file path. Path on the local file system of the service account key used for authorization. Can be set to 'auto-detect' when running on a Dataproc cluster. When running on other clusters, the file must be present on every node in the cluster.

https://cloud.google.com/storage/docs/authentication#generating-a-private-key

Required.
NamespaceString

A namespace partitions entities into a subset of datastore.

https://cloud.google.com/datastore/docs/concepts/multitenancy

Optional. If not provided, [default] namespace will be used.

Kind

String

The kind of an entity categorizes it for the purpose of Datastore queries. Equivalent to relational database table notion.

https://cloud.google.com/datastore/docs/concepts/entities#kinds_and_identifiers

Optional. Should be empty if GQL is indicated.
GQLString

SQL-like language which allows to query data by specific kind, keys with option to apply various filter conditions. For example: SELECT * FROM myKind WHERE myProp >= 100 AND myProp < 200

https://cloud.google.com/datastore/docs/concepts/queries

https://cloud.google.com/datastore/docs/reference/gql_reference

Optional. Should be empty if kind is indicated.
Include KeyBoolean

Key is unique identifier assigned to the entity when it is created. If property is set to true, __key__ column must be present in the schema definition. Type should be String or Long. Is needed when performing upserts to the Cloud Datastore.

https://cloud.google.com/datastore/docs/concepts/entities#kinds_and_identifiers

Required. False by default.
Eventually ConsistentBoolean

To improve performance, user can set eventually consistent read policy for ancestor queries. Note, this option takes no effect on global queries, since they are always eventually consistent regardless of the policy.

https://cloud.google.com/datastore/docs/concepts/queries#ancestor_queries

https://cloud.google.com/datastore/docs/concepts/structuring_for_strong_consistency

Required. False by default.
SchemaJSON schema

The schema of records output by the source. Will be mapped to the data returned from the query. Should contain column name, type and nullability. Can be imported or obtained using Get Schema button.

Required.

Sink

User Facing NameTypeDescriptionConstraints

Design / Implementation Tips

  • Tip #1
  • Tip #2

Design

Approach(s)

Properties
Project IDString

Google Cloud Project ID, which uniquely identifies a project. It can be found on the Dashboard in the Google Cloud Platform Console.

Required.
JSON key file pathString

The credential JSON key file path. Path on the local file system of the service account key used for authorization. Can be set to 'auto-detect' when running on a Dataproc cluster. When running on other clusters, the file must be present on every node in the cluster.

https://cloud.google.com/storage/docs/authentication#generating-a-private-key

Required.
NamespaceString

A namespace partitions entities into a subset of datastore.

https://cloud.google.com/datastore/docs/concepts/multitenancy

Optional. If not provided, [default] namespace will be used.

Kind

String

The kind of an entity categorizes it for the purpose of Datastore queries. Equivalent to relational database table notion.

https://cloud.google.com/datastore/docs/concepts/entities#kinds_and_identifiers

Required.
Indexed PropertiesList<String>

List of property names to be marked as indexed. Equivalent to relational database column notion.

https://cloud.google.com/datastore/docs/concepts/indexes

Optional. If not indicated, all properties are considered to be indexed by default.
Allow Generated KeyBoolean

Key is unique identifier assigned to the entity when it is created. User can specify its own key for the entity or already existing key to perform upserts. If property is set to false, __key__ field must be present in schema definition. Otherwise, Cloud Datastore will automatically assign numeric ID to the entity.

https://cloud.google.com/datastore/docs/concepts/entities#kinds_and_identifiers

Required. True by default.
AncestorsList<String>

The ancestor path identifies the common root entity in which the created entities are grouped. Each ancestor must have kind and key. Key can be named (represented as String) or numeric (represented as Long).

https://cloud.google.com/datastore/docs/concepts/structuring_for_strong_consistency

Optional.
TransactionalBoolean

Datastore commits are either transactional, meaning they take place in the context of a transaction and the transaction’s set of mutations are either all or none are applied, or non-transactional, meaning the set of mutations may not apply as all or none.

https://cloud.google.com/datastore/docs/concepts/transactions

Required. False by default.
SchemaJSON schema

The schema of records to be written. Should contain column name, type and nullability. Can be imported.

Required.

Security

Limitation(s)

Future Work

  • Some future work – HYDRATOR-99999
  • Another future work – HYDRATOR-99999

Test Case(s)

  • Test case #1
  • Test case #2

Sample Pipeline

Please attach one or more sample pipeline(s) and associated data. 

Pipeline #1

Pipeline #2




Table of Contents

Table of Contents
stylecircle

Checklist

  •  User stories documented 
  •  User stories reviewed 
  •  Design documented 
  •  Design reviewed 
  •  Feature merged 
  •  Examples and guides 
  •  Integration tests 
  •  Documentation for feature 
  •  Short video demonstrating the feature