Batch Runs Endpoint

Checklist

  • User Stories Documented
  • User Stories Reviewed
  • Design Reviewed
  • APIs reviewed
  • Release priorities assigned
  • Test cases reviewed
  • Blog post

Introduction 

If a client wants to get the last X runs for N different programs, they have to make N calls today. One example of this is the pipeline list UI, which gets the latest run for each pipeline in the list.

Goals

Make it possible for a client to get the last X runs for a list of programs in a single API call.

User Stories 

  • As a CDAP ops developer, I want to write a monitoring check that gets the latest run for multiple programs in a single call and alert if any of them have not run as expected
  • As a UI, I want to make a single call to fetch the latest run for multiple programs in order to display run information in a program list view
  • As a CDAP client, if a program in the request does not exist, I want to be able to tell from the response

Design

We will add a batch runs endpoint that is similar to the batch status endpoint. It will take a list of programs in its request. It will return a list of programs with the runs for each program in the request. If a program does not exist, it will be indicated in the response. The scan for latest program runs will happen in a separate transaction for each program. It is functionally equivalent to making N different calls to the runs endpoint.


The request will be:

POST v3/namespaces/<namespace-id>/runs
[
  {
    "appId": "my-app",
    "programType": "WORKFLOW",
    "programId": "DataPipelineWorkflow",
    "limit": 5 // optional limit for the number of runs. Defaults to 1
  }
] 

Note: this API request mirrors the POST v3/namespaces/<namespace-id>/status endpoint. I think it would actually be better if the request were an object with a 'programs' section that lists the programs, but making it this way for API consistency.


The response will be:

[
  {
    "appId": "my-app",
    "programType": "WORKFLOW",
    "programId": "DataPipelineWorkflow",
    "statusCode": 200,
    "runs": [
      {
        "runid": "<run-id>",
        "starting": 1234567890,
        ... // same content as the program specific runs endpoint
      },
      ...
    ]
  },
  {
    "appId": "my-app",
    "programType": "WORKFLOW",
    "programId": "DataPipelineWurkflu",
    "statusCode": 404 // if the program doesn't exist
  }


]


API changes

New Programmatic APIs

None

Deprecated Programmatic APIs

New REST APIs

PathMethodDescriptionResponse CodeResponse
/v3/namespaces/<namespace-id>/runsPOSTReturns the last N runs for each program in the request

200 - On success

500 - Any internal errors







Deprecated REST API

None

CLI Impact or Changes

  • Could add a new command, but not planned

UI Impact or Changes

  • UI can use this for the pipelines list view to improve performance
  • A side impact is that the UI cannot fill in data as it gets it. It will be all or nothing.

Security Impact 

None

Impact on Infrastructure Outages 

None

Test Scenarios

Test IDTest DescriptionExpected Results
1Get the last 5 runs for programs that all exist and have runsAll programs should be returned, with their last 5 runs
3Get the last run for a mix of programs that exist and don't existRequest should succeed, with programs that don't exist still in the response be with a not found status
4Get the last 10 runs for a mix of programs that have more than 10 runs and fewer than 10 runsEach program should have at most 10 runs

Releases

Release 5.1.0

Related Work

  • UI to use this new endpoint for pipelines list view


Future Work

None