A batch source that ingests data from dynamodb into hydrator pipelines.
Use case(s)
For an organization all the spam emails are being dumped to dynamodb table. As a data scientist I want to train my machine learning models in hydrator pipelines based on the data from the dynamodb tables.
User Storie(s)
User should be able to provide the table name in DynamoDb.
User should be able to provide the AWS endpoint url for DynamoDb instance.
User should be able to provide the AWS region id for DynamoDb instance.
User should be able to provide the throughput for DynamoDb instance. (Dynamo db charges are incurred based on throughput and user should be able to control the throughput)
User should be able to provide the AWS access id.
User should be able to provide the AWS access key.
Plugin Type
Batch Source
Batch Sink
Real-time Source
Real-time Sink
Action
Post-Run Action
Aggregate
Join
Spark Model
Spark Compute
Configurables
This section defines properties that are configurable for this plugin.
User Facing Name
Type
Description
Constraints
Table name
String
Name of the dynamo db table
Naming convention constraints from AWS
endpoint url
String
AWS endpoint url for DynamoDb instance
Optional, could be reconstructed using regionId
region id
String
AWS region id for DynamoDb instance.
Optional, with default value set as us_west_2
throughput
Int
Intended throughput for DynamoDb
(Optional)
access id
password
AWS access key
access key
password
AWS access secret key
query
String
Query to get the data
filterQuery
String
Query to filter the fetched data, befor returning to the client
Dropdown with the list of regions will be provided to user, to select the region for AWS Dynamo DB to connect to. Supported regions are: "us-gov-west-1", "us-east-1", "us-east-2", "us-west-1", "us-west-2", "eu-west-1", "eu-west-2", "eu-central-1", "ap-south-1","ap-southeast-1", "ap-southeast-2", "ap-northeast-1", "ap-northeast-2", "sa-east-1", "cn-north-1", "ca-central-1", "getCurrentRegion".(Referred from: http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/ http://docs.aws.amazon.com/general/latest/gr/rande.html#ddb_region)
If user does not select any region, then default region will be used, i.e. us_west_2.
getCurrentRegion from the list, returns a Region object representing the region the application is running in, when running in EC2. If this method is called from a non-EC2 environment, it will return null.
User will provide the complete query(fields and its value) through “Query” widget, that will be used to fetch the data. For example: year = 1985 and rating > 5
If there is any filter query required to fetch the data, it can be provided through “Filter Query” widget.