Requirements
- Operations
- Perform single + batch read on single + multiple dataset from script transform
- Perform single + batch read on single + multiple files from script transform
- Supported tables for lookup
- KeyValueTable dataset
- ObjectMappedTable dataset
- CSV files treated as a list of key-value pairs
- Optional caching with time-based expiration
Design
LookupKV interface
Object lookup(String key); Map<String, Object> multiLookup(String[] key);
- Implement LookupKV in KeyValueTable and ObjectMappedTable
- ScriptTransform changes
Add configuration property for declaring lookup tables to use, properties for each table (e.g. dataset properties)
"tables": [ { "name":"purchases", "type":"dataset", "properties": { "dataset":"purchases", "properties":{.. dataset properties ..}, "enableCache":"true", "cacheExpiry":1234 } }, {"name":"ip2geo", "type":"file", "properties":{"file":"/data/ip2geo.csv"}} ]
- configure(): verify tables (datasets and files) exist
- transform(): execute lookup methods in a transaction, provide LookupKV instance to script
Options for lookup usage:
var result = context.getTable("purchases").lookup(user); // or var result = tables["purchases"].lookup(user); // or var result = purchases.lookup(user);
Options for multiLookup usage:
var result = purchases.multiLookup(["alice", "bob"]); // do something with result["alice"] // do something with result["bob"]