Versions Compared
compared with
Key
- This line was added.
- This line was removed.
- Formatting was changed.
Introduction
A batch source that ingests data from dynamodb into hydrator pipelines.
Use case(s)
- For an organization all the spam emails are being dumped to dynamodb table. As a data scientist I want to train my machine learning models in hydrator pipelines based on the data from the dynamodb tables.
User Storie(s)
- User should be able to provide the table name in DynamoDb.
- User should be able to provide the AWS endpoint url for DynamoDb instance.
- User should be able to provide the AWS region id for DynamoDb instance.
- User should be able to provide the throughput for DynamoDb instance. (Dynamo db charges are incurred based on throughput and user should be able to control the throughput)
- User should be able to provide the AWS access id.
- User should be able to provide the AWS access key.
Plugin Type
- Batch Source
- Batch Sink
- Real-time Source
- Real-time Sink
- Action
- Post-Run Action
- Aggregate
- Join
- Spark Model
- Spark Compute
Configurables
This section defines properties that are configurable for this plugin.
User Facing Name | Type | Description | Constraints |
---|---|---|---|
Table name | String | Name of the dynamo db table | Naming convention constraints from AWS |
endpoint url | String | AWS endpoint url for DynamoDb instance | constraints from AWS |
region id | String | AWS region id for DynamoDb instance. | |
throughput | Int | Intended throughput for DynamoDb | (Optional) |
access id | String | AWS access id | |
access key | password | AWS access key | |
query | String | Query to get the data |
Design / Implementation Tips
- Please refer our Mongo db plugins
- Please refer https://github.com/awslabs/emr-dynamodb-connector
Design
We will provide dropdown with the list of supported regions to user, to select the region for AWS Dynamo DB to connect to.
Approach(s)
Properties
- endpointUrl: The hostname and port for AWS Dynamo DB instance to connect to, separated by a colon. For example, localhost:8000.
- regionId: The region for AWS Dynamo DB to connect to.
- accessKey: Access key for AWS Dynamo DB.
- secretAccessKey: Secret access key for AWS Dynamo DB.
- tableName: The table to read the data from.
- throughput: Intended throughput for AWS Dynamo DB.
- query: The query that will fetch the data from table.
Security
Limitation(s)
Future Work
- Some future work – HYDRATOR-99999
- Another future work – HYDRATOR-99999
Test Case(s)
- Test case #1
- Test case #2
Sample Pipeline
Pipeline #1
Pipeline #2
Table of Contents
Table of Contents style circle
Checklist
- User stories documented
- User stories reviewed
- Design documented
- Design reviewed
- Feature merged
- Examples and guides
- Integration tests
- Documentation for feature
- Short video demonstrating the feature