Versions Compared
compared with
Key
- This line was added.
- This line was removed.
- Formatting was changed.
Introduction
A rules engine transform will apply predefined rules to incoming data (realtime as well as batch). Rules must be generic enough to allow updates to a dataset, posting to an HTTP endpoint, sending an email, etc.
Use case(s)
- CompanyA wants to develop a streaming pipeline to read and process signals from wearable and non-wearable devices, and apply rules on incoming signals. Based on the rules, it wants to send notifications to configured mobile devices to provide concierge and/or healthcare services.
- CompanyA has a rules management system that can allow users to feed in rules for devices. Rules can state actions to be taken if certain conditions are met in the signals from the provided devices. These rules are stored in a CDAP dataset. A streaming pipeline will then read the rules dataset and apply rules applicable for incoming signals to trigger appropriate notifications.
- CompanyA would like a selection criteria for applying rules to incoming signals. Selection criteria could be based on device hierarchy, but could also be arbitrary. Hydrator should allow the user to specify a rule selection criteria to select the set of rules to apply on an incoming signal.
User Storie(s)
- As a Hydrator user, I would like to apply rules rule-sets on the incoming stream to trigger notifications or take any other appropriate actions if necessary.
- As a Hydrator user, I would like to be able to look up rules from a rules repository
- As a Hydrator user, I would like to add attach rules to the rule sets
- As a Hydrator user, I would like to add rule sets to the rules repository
- As a Hydrator user, I would like to apply only those rules that are applicable to the incoming record. For achieving this, I would like to specify a selection criteria for rules. The selection criteria may be based on input schema.
Plugin Type
- Batch Source
- Batch Sink
- Real-time Source
- Real-time Sink
- Action
- Post-Run Action
- Aggregate
- Join
- Spark Model
- Spark Compute
- Transform
Configurables
This section defines properties that are configurable for this plugin.
User Facing Name | Type | Description | Constraints |
---|---|---|---|
Input Schema | String | Schema of the input data | Specified as a CDAP schema |
Ruleset Repository | String | Name of the dataset to be used as a repository of rules | |
Ruleset Selection Criteria | String | Using some grammar, this string specifies the logic of selecting a rule from the rules dataset | |
Output Schema | String | Schema of the output data, containing the fields containing rules outputs | Specified as a CDAP schema |
Design / Implementation Tips
- Tip #1
- Tip #2
Design
Approach(s)
Properties- The format of the rules can be in compliance with Jexl http://commons.apache.org/proper/commons-jexl/
- Rules in the rule-sets should be added in the order of their priority.
- Field names in the rules should be populated automatically by parsing the rule
Design
Image Added
The output fields will be populated based on the output of the application of rules on one or many fields in the input schema.
Rules DB
Id | Rule | description | Fields used in the rules |
1 | if (Age > 18) { Age_group = Adult; } else { Age_group = Child; } | Category rule to classify between age groups | Age |
Rule set DB
Rule set Id | List of rules | description | ||
34 |
| Rule set to classify target customer groups |
Approach(s)
- We will Set up a rule-set db in Tracker where a user can go and add rules with the specified rule set conditions
- If user does not select any rule set, The output will be the result of one to one mapping of the input and the output schema
- User will be able to reference the rule-set db from within the plugin to assign rule sets
- Rule-sets could be applied in the hierarchal manner such that, If no rule set exists for Field A, lookup rules for for Field B
- Post actions could be taken based on the values in the output.
Properties
- Rule-set DB: Name of the data set to be used as the rules repository, If left blank the default Tracker rules dataset will be used.
- LookUp fields: Map of out put fields and corresponding list of fields based on which the rule - sets are to be applied. Multiple rules can be applied for single field.
Security
Limitation(s)
Point lookups are not currently supported by spark streaming
Future Work
- Some future work – HYDRATOR-99999
- Another future work – HYDRATOR-99999
Test Case(s)
- Test case #1
- Test case #2
Sample Pipeline
Please attach one or more sample pipeline(s) and associated data.
Pipeline #1
Pipeline #2
Table of Contents
Table of Contents style circle
Checklist
- User stories documented
- User stories reviewed
- Design documented
- Design reviewed
- Feature merged
- Examples and guides
- Integration tests
- Documentation for feature
- Short video demonstrating the feature