Versions Compared
Key
- This line was added.
- This line was removed.
- Formatting was changed.
Introduction
A batch source that ingests data from dynamodb into hydrator pipelines.
Use case(s)
- For an organization all the spam emails are being dumped to dynamodb table. As a data scientist I want to train my machine learning models in hydrator pipelines based on the data from the dynamodb tables.
User Storie(s)
- User should be able to provide the table name in DynamoDb.
- User should be able to provide the AWS endpoint url for DynamoDb instance.
- User should be able to provide the AWS region id for DynamoDb instance.
- User should be able to provide the throughput for DynamoDb instance. (Dynamo db charges are incurred based on throughput and user should be able to control the throughput)
- User should be able to provide the AWS access id.
- User should be able to provide the AWS access key.
Plugin Type
- Batch Source
- Batch Sink
- Real-time Source
- Real-time Sink
- Action
- Post-Run Action
- Aggregate
- Join
- Spark Model
- Spark Compute
Configurables
This section defines properties that are configurable for this plugin.
User Facing Name | Type | Description | Constraints |
---|---|---|---|
Table name | String | Name of the dynamo db table | Naming convention constraints from AWS |
endpoint url | String | AWS endpoint url for DynamoDb instance | Optional, could be reconstructed using regionId |
region id | String | AWS region id for DynamoDb instance. | |
throughput | Int | Intended throughput for DynamoDb | (Optional) |
access id | String | AWS access id | |
access key | password | AWS access key | |
query | String | Query to get the data | |
parition key | String | Partition key to get the data | |
sort key | String | Sort key to refine/sort the fetched data | (Optional) |
Design / Implementation Tips
- For Testing purposes tables can be created either using AWS cli or using java code http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/JavaDocumentAPITablesExample.html
- AWS dynamoDb cli refrence http://docs.aws.amazon.com/cli/latest/reference/dynamodb/
- Java example for CRUD operations http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/batch-operation-document-api-java.html
- Java Example for working with queries http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/QueryingJavaDocumentAPI.html
- Please reuse/modify RecordReader, InputFormat classes present here https://github.com/awslabs/emr-dynamodb-connector
Design
We will provide dropdown with the list of supported regions to user, to select the region for AWS Dynamo DB to connect to.
Dynamo Db JSON Format:
{
"name": "DynamoDb",
"type": "batchsource",
"properties": {
"accessKey": "xyz",
"secretAccessKey": "abc",
"regionId": "us-east-1",
"endpointUrl": "localhost:8000",
"tableName": "Movies",
"throughput": "10",
"query": "ID = :v_ID",
"queryArgumentsMap": ":v_Id=198"
"partitionKey": "Id",
"sortKey": "salary"
}
}
Approach(s)
Properties
- endpointUrl: aws endpoint http://docs.aws.amazon.com/general/latest/gr/rande.html#ddb_region This could be reconstructed using regionId.
- regionId: The region for AWS Dynamo DB to connect to.
- accessKey: Access key for AWS Dynamo DB.
- secretAccessKey: Secret access key for AWS Dynamo DB.
- tableName: The table to read the data from.
- throughput: Intended throughput for AWS Dynamo DB.
- query: The query that will fetch the data from table.
- queryArgumentsMap: Comma separated list of arguments identifier specified in "Query" along with its value. Key and value are separated by equality operator.
- partitionKey: Partition key, that will be used to fetch the data from table.
- sortKey: Sort Key, that will be used to sort/refine the fetched data from table.
Security
- The AWS access keys should be a password field and macros enabled
Limitation(s)
Future Work
Test Case(s)
Sample Pipeline
Table of Contents
Table of Contents style circle
Checklist
- User stories documented
- User stories reviewed
- Design documented
- Design reviewed
- Feature merged
- Examples and guides
- Integration tests
- Documentation for feature
- Short video demonstrating the feature