Checklist

User Stories Documented
User Stories Reviewed
Design Reviewed
APIs reviewed
Release priorities assigned
Test cases reviewed
Blog post

Introduction

If a client wants to get the last X runs for N different programs, they have to make N calls today. One example of this is the pipeline list UI, which gets the latest run for each pipeline in the list.

Goals

Make it possible for a client to get the last X runs for a list of programs in a single API call.

User Stories

As a CDAP ops developer, I want to write a monitoring check that gets the latest run for multiple programs in a single call and alert if any of them have not run as expected
As a UI, I want to make a single call to fetch the latest run for multiple programs in order to display run information in a program list view
As a CDAP client, if a program in the request does not exist, I want to be able to tell from the response

Design

We will add a batch runs endpoint that is similar to the batch status endpoint. It will take a list of programs in its request. It will return a list of programs with the runs for each program in the request. If a program does not exist, it will be indicated in the response. The scan for latest program runs will happen in a separate transaction for each program. It is functionally equivalent to making N different calls to the runs endpoint.

The request will be:

POST v3/namespaces/<namespace-id>/runs
[
  {
    "appId": "my-app",
    "programType": "WORKFLOW",
    "programId": "DataPipelineWorkflow",
    "limit": 5 // optional limit for the number of runs. Defaults to 1
  }
]

Note: this API request mirrors the POST v3/namespaces/<namespace-id>/status endpoint. I think it would actually be better if the request were an object with a 'programs' section that lists the programs, but making it this way for API consistency.

The response will be:

[
  {
    "appId": "my-app",
    "programType": "WORKFLOW",
    "programId": "DataPipelineWorkflow",
    "statusCode": 200,
    "runs": [
      {
        "runid": "<run-id>",
        "starting": 1234567890,
        ... // same content as the program specific runs endpoint
      },
      ...
    ]
  },
  {
    "appId": "my-app",
    "programType": "WORKFLOW",
    "programId": "DataPipelineWurkflu",
    "statusCode": 404 // if the program doesn't exist
  }


]

API changes

New Programmatic APIs

None

Deprecated Programmatic APIs

New REST APIs

Path

Method

Description

Response Code

Response

/v3/namespaces/<namespace-id>/runs

POST

Returns the last N runs for each program in the request

200 - On success

500 - Any internal errors

Deprecated REST API

None

CLI Impact or Changes

Could add a new command, but not planned

UI Impact or Changes

UI can use this for the pipelines list view to improve performance
A side impact is that the UI cannot fill in data as it gets it. It will be all or nothing.

Security Impact

None

Impact on Infrastructure Outages

None

Test Scenarios

Test ID	Test Description	Expected Results
1	Get the last 5 runs for programs that all exist and have runs	All programs should be returned, with their last 5 runs
3	Get the last run for a mix of programs that exist and don't exist	Request should succeed, with programs that don't exist still in the response be with a not found status
4	Get the last 10 runs for a mix of programs that have more than 10 runs and fewer than 10 runs	Each program should have at most 10 runs

CDAP

Batch Runs Endpoint

Introduction

Goals

User Stories

Design

API changes

New Programmatic APIs

Deprecated Programmatic APIs

New REST APIs

Deprecated REST API

CLI Impact or Changes

UI Impact or Changes

Security Impact

Impact on Infrastructure Outages

Test Scenarios

Releases

Release 5.1.0

Related Work

Future Work