...
- Operations
- Perform single + batch read on single + multiple dataset from script transform
- Perform single + batch read on DistributedCache single + multiple files from script transform
- Supported datasets tables for lookup
- Key-value table
- ObjectMappedTableKeyValueTable dataset
- ObjectMappedTable dataset
- CSV files treated as a list of key-value pairs
- Optional caching with time-based expiration
Design
LookupKV Lookup interface
Implement LookupKVCode Block Objectinterface Lookup<T> { T lookup(String key); Map<String, Object>T> multiLookup(String[] key);
lookup(String... keys); Map<String, T> lookup(Set<String> keys); }
- Implement Lookup in KeyValueTable and ObjectMappedTable
- KeyValueTable implements Lookup<String>
- ObjectMappedTable implements Lookup<StructuredRecord>
- DatasetConfigurer changes
- Add method: void useDataset(String datasetName);
- ScriptTransform changes
Add configuration property for declaring lookup tables to use, properties for each table (e.g. dataset properties)
Example
"datasetProperties": {Code Block "tables": [ { "name":"purchases", "type":"dataset", "properties": { "dataset":"purchases",
enable.cache"properties":{.. dataset properties ..}, "
cache.expiryenableCache":"true", "
cacheExpiry":1234 } }, {"name":"ip2geo", "type":"file", "properties":{"file":"/data/ip2geo.csv"}} ]
- configure(): verify datasets / tables existtables (datasets and files) exist by calling DatasetConfigurer.useDataset()
- transform(): execute lookup methods in a transaction, provide LookupKV Lookup instance to script
- Sample
Options for lookup usage:
getTableCode Block var result = context.
getLookup("purchases").lookup(user)
- Alternative: tables["purchases"].lookup(user)
- Alternative: purchases.lookup(user)
- Sample usage for multiLookup
;
Options for batch lookup usage:
Code Block var result = context.getLookup("purchases").multiLookuplookup(["alice", "bob"]); // do something with result["alice"] // do something with result["bob"]
- Sample
...