Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Jira Legacy
serverCask Community Issue Tracker
serverId45b48dee-c8d6-34f0-9990-e6367dc2fe4b
keyCDAP-4230

Use case 2 (

...

selector):

The pipeline reads from twitter. If the tweet is english, we want to run it through an english categorizer transform.  If not, we want to send it to a translate transform before sending it on to the categorizer transform. If the translate transform is unable to translate, we want to write the record to an error dataset. Also, if the tweet is categorized as spam, we want to write the record to the error dataset.  Otherwise, we want to write our translated, categorized tweet to a Table. This could be represented purely with forks:

...

This seems unnatural though.  After every fork, we have a filter on both paths, with each filter the 'not' of the other filter.  Seems more natural to have conditionsa selector:

 

Code Block
{
  "connections": [
    { 
      "from": "twitter source",
      "to": "language tagger"
    },
    {
      "from": "language tagger",
      "to": "categorizer",
      "ifselector": {
        "scripttype": "function (input) { return input.language == "en"; }"fieldvalue",
        "scriptEnginefield": "javascriptlang",
        "elseToswitch": "translator"{
      }     },
//output stage -> field value
   {       "fromcategorizer": "translatoren",
       "to": "categorizer", },
        // optional. if doesn't match anything in outputs, go here.
        // if absent, record is dropped
        "ifdefault": {
  "translator"
      }
    },
    {
      "scriptfrom": "translator",
      "selector": {
     def (input):  "type": "fieldvalue",
         return input['language'] == 'en'"field": "lang",
        "switch": {
 ",         "scriptEnginecategorizer": "jython"en"
        },
        "elseTodefault": "invalid tweets table"
      }
    },
    {
      "from": "categorizer",
      "toselector": "categorized{
tweets table",       "iftype": {"fieldvalue",
        "script": "function (input) {
          return !input.spam; }",;
        }",
        "switch": {
          "scriptEnginecategorized tweets table": "javascript"true,
         "elseTo": "invalid tweets table": false
        }
      }
    }
  ]
}

This would also allow more complex cases than just if-else. For would not support a use case where we would want a record to go to multiple outputs.  For example, suppose we are reading from an employees table, and we want to write the employee salary to a table that groups salaries by several categories. If an employee is over a certain age, we want to lookup their retirement plan. If the employee's nationality is x, y, or z, we want to do an immigration status lookup.  No matter what, we want to categorize employee performance before writing to the table:

Image Removed

 

writing to the table:

Image Added

For a use case like this, we could introduce an 'if' condition to the connection.

Code Block
{
  "connections": [
    {
      "from": "employees table",
      "to": "retirement plan lookup",
      "if": {
        "script": "function (input) { return input.age > 65; }",
        "scriptEngine": "javascript"
      }
    },
    {
      "from": "employees table",
      "to": "immigration status lookup",
      "if": {
        "script": "function (input) { return input.nationality == x || input.nationality == y || input.nationality == z; }",
        "scriptEngine": "javascript"
      }
    },
    {
      "from": "employees table",
      "to": "performance categorizer"
    },
    {
      "from": "performance categorizer",
      "to": "salary by category table"
    },
    {
      "from": "immigration status lookup",
      "to": "salary by category table"
    },
    {
      "from": "retirement plan lookup",
      "to": "salary by category table"
    }
  ]
}

One thing to note is that in this pipeline, an employee that is older than 65 with nationality x will get sent to all three branches to generate each type of category.

Also note that one side effect of this change would be that the filter transform would no longer be needed.This same thing could also be represented as a fork with a filter on the top and bottom branches. So we may not do this.

 

Realtime Stream source

Note

We may just add the ability to read from a stream to a worker instead of this.

...