Versions Compared
compared with
Key
- This line was added.
- This line was removed.
- Formatting was changed.
Introduction
A batch sink that pushes data from hydrator pipelines into dynamoDb tables.
Use case(s)
- An organization wants to parse the logs generated by a system and want to store the metadata in dynamodb tables.
User Storie(s)
- User should be able to provide the table name in DynamoDb.
- User should be able to provide the primary key of the table.
- User should be able to provide the type of primary key (hash or range).
- The table should be created if it is not already existing.
- User should be able to provide the AWS endpoint url for DynamoDb instance.
- User should be able to provide the AWS region id for DynamoDb instance.
- User should be able to provide the AWS access id.
- User should be able to provide the AWS access key.
Plugin Type
- Batch Source
- Batch Sink
- Real-time Source
- Real-time Sink
- Action
- Post-Run Action
- Aggregate
- Join
- Spark Model
- Spark Compute
Configurables
This section defines properties that are configurable for this plugin.
User Facing Name | Type | Description | Constraints |
---|---|---|---|
Table name | String | Name of the dynamo db table | Naming convention constraints from AWS |
Primary key fields | List<Map<String,String>> | Primary key fields of the table | There should be at least 1 primary key |
endpoint url | String | AWS endpoint url for DynamoDb instance | Optional, could be reconstructed using regionId |
region id | String | AWS region id for DynamoDb instance. | |
throughput | Int | Intended throughput for DynamoDb | (Optional) |
access id | String | AWS access id | |
access key | password | AWS access key |
Design / Implementation Tips
- For Testing purposes tables can be created either using AWS cli or using java code http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/JavaDocumentAPITablesExample.html
- AWS dynamoDb cli refrence http://docs.aws.amazon.com/cli/latest/reference/dynamodb/
- Java example for CRUD operations http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/batch-operation-document-api-java.html
- Please reuse/modify RecordReader, InputFormat classes present here https://github.com/awslabs/emr-dynamodb-connector
Design
Approach(s)
- Dropdown with the list of regions will be provided to user, to select the region for AWS Dynamo DB to connect to. Supported regions are:
"us-gov-west-1", "us-east-1", "us-east-2", "us-west-1", "us-west-2", "eu-west-1", "eu-west-2", "eu-central-1", "ap-south-1","ap-southeast-1", "ap-southeast-2", "ap-northeast-1", "ap-northeast-2", "sa-east-1", "cn-north-1", "ca-central-1", "getCurrentRegion". (Referred from: http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/
http://docs.aws.amazon.com/general/latest/gr/rande.html#ddb_region) - If user does not select any region, then default region will be used, i.e. us_west_2.
- getCurrentRegion from the list, returns a Region object representing the region the application is running in, when running in EC2. If this method is called from a non-EC2 environment, it will return null.
- The plugin will support following CDAP data types in schema: String, Number(int, long, float, double), Bytes, Boolean, NULL, Map, List, Array of String and Array of Number
Properties
- endpointUrl: aws endpoint http://docs.aws.amazon.com/general/latest/gr/rande.html#ddb_region This could be reconstructed using regionId.
- regionId: The region for AWS Dynamo DB to connect to.
- accessKey: Access key for AWS Dynamo DB.
- secretAccessKey: Secret access key for AWS Dynamo DB.
- tableName: The table to read the data from.
- primaryKey: The field name to be used as priary key.
Security
- The AWS access keys should be a password field and macros enabled
Limitation(s)
Future Work
Test Case(s)
Sample Pipeline
Table of Contents
Table of Contents style circle
Checklist
- User stories documented
- User stories reviewed
- Design documented
- Design reviewed
- Feature merged
- Examples and guides
- Integration tests
- Documentation for feature
- Short video demonstrating the feature