Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Checklist

  •  User Stories Documented
  •  User Stories Reviewed
  •  Design Reviewed
  •  APIs reviewed
  •  Release priorities assigned
  •  Test cases reviewed
  •  Blog post

Introduction 

Briefly write the need for this feature

Goals

Clearly state the design goals/requirements for this feature 

User Stories 

  • Breakdown of User-Stories 
  • User Story #1
  • User Story #2
  • User Story #3

Design

 

Code BlockConsider a following pipeline:

Consider the following pipeline for the design purpose:

Code Block
																			TRUE
File (Source) -> CSV Parser(Transform) -> Filter (Transform) -> Condition1--------> Logistic Regression (Sink) 
																	|		           
															FALSE	| 				   TRUE
																	|-----> Condition2-------> Random Forest (Sink)
																				|
																		FALSE	|
																				|		TRUE
																			Condition3--------> Decision Tree (Sink)
	
 

In

the

above

pipeline,

we

want

to

execute

the

classification

algorithm

based

on

the

runtime

argument

'input.algorithm'.

We

also

do

not

want

to

run

the

expensive

model

generation

process

if

the

Filter

transform

did

not

produce

the

records

enough

to

proceed

further.   The pipeline is configured with 3 Condition nodes - 1. Condition1 node: output.filter Greater Than 1000 && input.algorithm Equals 'Logistic Regression' 2. Condition2 node: output.filter Greater Than 1000 && input.algorithm Equals 'Random Forest' 3. Condition3 node: output.filter Greater Than 1000 && input.algorithm Equals 'Decision Tree'  

 

Representation of the Condition in the Pipeline config

further.

The pipeline is configured with 3 condition nodes:

  1. Condition1: output.filter Greater Than 1000 AND input.algorithm Equals 'Logistic Regression'
  2. Condition2: output.filter Greater Than 1000 AND input.algorithm Equals 'Random Forest'
  3. Condition3: output.filter Greater Than 1000 AND input.algorithm Equals 'Decision Tree'

Representation of the Condition in the Pipeline config

Following is one possible representation of the condition stage in the config json. Since conditions are individual stages, they will also appear in the connections section similar to other stages.

Code Block
{  
   "name":"Condition1",
   "plugin":{  
      "name":"Condition",
      "type":"condition",
      "label":"Condition1",
      "artifact":{  
         "name":"condition-plugins",
         "version":"1.7.0",
         "scope":"SYSTEM"
      },
      "properties":{  
         "conditions":{  
            "cond1":{  
               "subject":"output.filter",
               "operator":"Greater Than",
               "target":"1000"
            },
            "cond2":{  
               "subject":"input.algorithm",
               "operator":"Equals",
               "target":"Logistic Regression"
            },
            "expressions":[  
               {  
                  "operator":"AND",
                  "operand1":"cond1",
                  "operand2":"cond2"
               }
            ],
            "connectors":{  
               "TRUE":"LogisticRegressionStage",
               "FALSE":"Condition2"
            }
         }
      }
   }
}
      
    

 

Approach

Approach #1

Approach #2

API changes

New Programmatic APIs

New Java APIs introduced (both user facing and internal)

Deprecated Programmatic APIs

New REST APIs

PathMethodDescriptionResponse CodeResponse
/v3/apps/<app-id>GETReturns the application spec for a given application

200 - On success

404 - When application is not available

500 - Any internal errors

 

     

Deprecated REST API

PathMethodDescription
/v3/apps/<app-id>GETReturns the application spec for a given application

CLI Impact or Changes

  • Impact #1
  • Impact #2
  • Impact #3

UI Impact or Changes

  • Impact #1
  • Impact #2
  • Impact #3

Security Impact 

What's the impact on Authorization and how does the design take care of this aspect

Impact on Infrastructure Outages 

System behavior (if applicable - document impact on downstream [ YARN, HBase etc ] component failures) and how does the design take care of these aspect

Test Scenarios

Test IDTest DescriptionExpected Results
   
   
   
   

Releases

Release X.Y.Z

Release X.Y.Z

Related Work

  • Work #1
  • Work #2
  • Work #3

 

Future work