A batch source that ingests data from dynamodb into hydrator pipelines.
Use case(s)
For an organization all the spam emails are being dumped to dynamodb table. As a data scientist I want to train my machine learning models in hydrator pipelines based on the data from the dynamodb tables.
User Storie(s)
User should be able to provide the table name in DynamoDb.
User should be able to provide the AWS endpoint url for DynamoDb instance.
User should be able to provide the AWS region id for DynamoDb instance.
User should be able to provide the throughput for DynamoDb instance. (Dynamo db charges are incurred based on throughput and user should be able to control the throughput)
User should be able to provide the AWS access id.
User should be able to provide the AWS access key.
Plugin Type
Batch Source
Batch Sink
Real-time Source
Real-time Sink
Action
Post-Run Action
Aggregate
Join
Spark Model
Spark Compute
Configurables
This section defines properties that are configurable for this plugin.
regionId: The region for AWS Dynamo DB to connect to.
accessKey: Access key for AWS Dynamo DB.
secretAccessKey: Secret access key for AWS Dynamo DB.
tableName: The table to read the data from.
throughput: Intended throughput for AWS Dynamo DB.
query: The query that will fetch the data from table.
queryArgumentsMap: Comma separated list of arguments identifier specified in "Query" along with its value. Key and value are separated by equality operator.
partitionKey: Partition key, that will be used to fetch the data from table.
sortKey: Sort Key, that will be used to sort/refine the fetched data from table.
Security
The AWS access keys should be a password field and macros enabled