Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Operations
    1. Perform single + batch read on single + multiple dataset from script transform
    2. Perform single + batch read on DistributedCache single + multiple files from script transform
  2. Supported datasets tables for lookup
    1. Key-value table
    2. ObjectMappedTable
    3. KeyValueTable dataset
    4. ObjectMappedTable dataset
    5. CSV files treated as a list of key-value pairs
  3. Optional caching with time-based expiration

Design

  1.  Lookup interface 

    LookupKV interfaceObject
    Code Block
    interface Lookup<T> {
      T lookup(String key);
    Map<String, Object> multiLookup(String[] key
    
      Map<String, T> lookup(String... keys);
      Map<String, T> lookup(Set<String> keys);
    }
  2. Implement Lookup in KeyValueTable and ObjectMappedTable
    1. KeyValueTable implements Lookup<String>
    2. ObjectMappedTable implements Lookup<StructuredRecord>
  3. DatasetConfigurer changes
    1. Add method: void useDataset(String datasetName);
  4. ScriptTransform changes
    1. Add configuration property for declaring lookup tables to use, properties for each table (e.g. dataset properties)

      Example

      Code Block
      "tables": [
        {
          "name":"purchases",
          "type":"dataset",
          "properties": {
            "dataset":"purchases",
      
      "datasetProperties
            "properties":
      {
      {.. dataset properties ..},
      
      "enable.cache
            "enableCache":"true",
      
      "cache.expiry
            "cacheExpiry":1234
          }
        },
        {"name":"ip2geo", "type":"file", "properties":{"file":"/data/ip2geo.csv"}}
      ]
    2. configure(): verify datasets / tables existtables (datasets and files) exist by calling DatasetConfigurer.useDataset()
    3. transform(): execute lookup methods in a transaction, provide LookupKV Lookup instance to script
      1. Sample

        Options for lookup usage: 

        Code Block
        var result = context.
        getTable
        getLookup("purchases").lookup(user)
      2. Alternative: tables["purchases"].lookup(user)
      3. Alternative: purchases.lookup(user)
      4. Sample usage for multiLookup
        ;
      5. Options for batch lookup usage:

        Code Block
        var result = context.getLookup("purchases").multiLookuplookup(["alice", "bob"]);
        // do something with result["alice"]
        // do something with result["bob"]

...