Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Introduction

A batch sink that pushes data from hydrator pipelines into dynamoDb tables.

Use case(s)

  • An organization wants to parse the logs generated by a system and want to store the metadata in dynamodb tables.

User Storie(s)

    • User should be able to provide the table name in DynamoDb. 
    • User should be able to provide the primary key of the table. 
    • User should be able to provide the type of primary key (hash or range).
    • The table should be created if it is not already existing. 
    • User should be able to provide the AWS endpoint url for DynamoDb instance.
    • User should be able to provide the AWS region id for DynamoDb instance.
    • User should be able to provide the AWS access id.
    • User should be able to provide the AWS access key.

Plugin Type

  •  Batch Source
  •  Batch Sink 
  •  Real-time Source
  •  Real-time Sink
  •  Action
  •  Post-Run Action
  •  Aggregate
  •  Join
  •  Spark Model
  •  Spark Compute

Configurables

This section defines properties that are configurable for this plugin. 

User Facing Name
Type
Description
Constraints
Table nameStringName of the dynamo db tableNaming convention constraints from AWS
Primary key fieldsList<Map<String,String>>Primary key fields of the tableThere should be at least 1 primary key
endpoint urlStringAWS endpoint url for DynamoDb instance

Optional, could be reconstructed using regionId

region idStringAWS region id for DynamoDb instance. 
throughputIntIntended throughput for DynamoDb(Optional)
access idStringAWS access id 
access keypasswordAWS access key 
Primary key typesList<Map<String,String>>Key types for the primary keys, used for creating the tableThe primary key type can only have 2 values HASH and RANGE

Design / Implementation Tips

Design

DynamoDB Sink JSON format:

Code Block
languagexml
{
     "name": "DynamoDb",
     "type": "batchsink",
     "properties": {
         "endpointUrl": "",
         "regionId": "us-east-1",
         "accessKey": "xyz",
         "secretAccessKey": "abc",
         "tableName": "Movies",
          "primaryKey"primaryKeyFields": "Id:N",
         "primaryKeyTypes": "Id:HASH",  
         "throughput": "10"
    }
}

 

Approach(s)

  1. Dropdown with the list of regions will be provided to user, to select the region for AWS Dynamo DB to connect to. Supported regions are:          
    "us-gov-west-1", "us-east-1", "us-east-2", "us-west-1", "us-west-2", "eu-west-1", "eu-west-2", "eu-central-1", "ap-south-1","ap-southeast-1", "ap-southeast-2", "ap-northeast-1", "ap-northeast-2", "sa-east-1", "cn-north-1", "ca-central-1", "getCurrentRegion".   (Referred from: http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/
    http://docs.aws.amazon.com/general/latest/gr/rande.html#ddb_region)

  2. If user does not select any region, then default region will be used, i.e. us_west_2.

  3. getCurrentRegion from the list, returns a Region object representing the region the application is running in, when running in EC2. If this method is called from a non-EC2 environment, it will return null.

  4. The plugin will support following CDAP data types in schema:   String, Number(int, long, float, double), Bytes, Boolean, NULL, Map, List, Array of String and Array of Number.

  5. Key value drop-down to take the name of the primary key fields and attribute type. The drop-down will allow following values: String, Number(int, long, float, double), Boolean, NULL, Map, List, Array of String and Array of Number.

  6. Key value drop-down to take the name of the primary key fields and key type. The drop-down will have the following values: "N"(number) and "S"(string).

 

Properties

  • endpointUrl: aws endpoint http://docs.aws.amazon.com/general/latest/gr/rande.html#ddb_region This could be reconstructed using regionId.
  • regionId: The region for AWS Dynamo DB to connect to.
  • accessKey: Access key for AWS Dynamo DB.
  • secretAccessKey: Secret access key for AWS Dynamo DB.
  • tableName: The table to read the data from.
  • primaryKeyprimaryKeyFields: The field name to be used as priary primary key and its type.
  • primaryKeyTypes: Primary key field names and type
  • throughput: Intended throughput for DynamoDb.

Security

  • The AWS access keys should be a password field and macros enabled

Limitation(s)

Future Work

Test Case(s)

Sample Pipeline

 

Table of Contents

Table of Contents
stylecircle

Checklist

  •  User stories documented 
  •  User stories reviewed 
  •  Design documented 
  •  Design reviewed 
  •  Feature merged 
  •  Examples and guides 
  •  Integration tests 
  •  Documentation for feature 
  •  Short video demonstrating the feature