Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This seems unnatural though.  After every fork, we have a filter on both paths, with each filter the 'not' of the other filter.  Seems more natural to have conditions:

 

Code Block
{
  "connections": [
    { 
      "from": "twitter source",
      "to": "language tagger"
    },
    {
      "from": "language tagger",
      "to": "categorizer",
      "conditionif": {
        "script": "function (input) { return input.language == "en"; }",
        "enginescriptEngine": "javascript",
        "elseTo": "translator"
      }
    },
    {
      "from": "translator",
      "to": "categorizer",
      "conditionif": {
        "script": "function
          def (input) {:
            return input.['language'] == "en"; }'en'
        ",
        "enginescriptEngine": "javascriptjython",
        "elseTo": "invalid tweets table"
      }
    },
    {
      "from": "categorizer",
      "to": "categorized tweets table",
      "conditionif": {
        "script": "function (input) { return !input.spam; }",
        "enginescriptEngine": "javascript",
        "elseTo": "invalid tweets table"
      }
    }
  ]
}

 This would also allow more complex cases than just if-else. For example, suppose we are reading from an employees table, adding categories to each employee, then writing to a table that will help calculate average salaries per category.  Before we add categories, we want to do a retirement plan lookup if the employee's age is greater than 65. Before we add categories, we also want to do an immigration status lookup if the employee's nationality is x, y, or z:

Image Added

Code Block
{
  "connections": [
    {
      "from": "employees table",
      "to": "retirement plan lookup",
      "condition": {
        "script": "function (input) { return input.age > 65; }",
        "scriptEngine": "javascript"
      }
    },
    {
      "from": "employees table",
      "to": "immigration status lookup",
      "condition": {
        "script": "function (input) { return input.nationality == x || input.nationality == y || input.nationality == z; }",
        "scriptEngine": "javascript"
      }
    },
    {
      "from": "retirement plan lookup",
      "to": "categorizer"
    },
    {
      "from": "immigration status lookup",
      "to": "categorizer"
    },
    {
      "from": "categorizer",
      "to": "salary by category table"
    }
  ]
}

One thing to note is that in this pipeline, an employee that is older than 65 with nationality x will get sent to the categorizer three separate times.  If that is not desired, the conditions would need to reflect that.

Also note that one side effect of this change would be that the filter transform would no longer be needed.

Realtime Stream source

Note

We may just add the ability to read from a stream to a worker instead of this.

...