Checklist
- Design Reviewed
- APIs reviewed
- Release priorities assigned
- Test cases reviewed
- Blog post
Introduction
Runtime monitor will monitor and collect program states, metadata, lineage, workflow token..
Approaches
Approach #1
In order to collect all the monitoring data, Runtime Monitor will poll heartbeat messages from Heartbeat Handler periodically using single rest endpoint:
- Runtime Monitor polls for next batch of heartbeat messages along with last persisted offset for each topic
- Heartbeat Handler will fetch heartbeat messages from each topic (status, lineage, metadata..) using last persisted offset provided by Runtime Monitor
- Heartbeat Handler will gathers all the heartbeat messages and sends in a batch to Runtime Monitor along with processed offsets for each topic.
- If the Runtime Monitor fails, it will start from last persisted offset for each topic and ask for heartbeat messages after that.
- If the Runtime Monitor fails while it is making changes to the corresponding stores, it may reprocess some heartbeat messages depending on what last offset is.
Pros:
- Less number of http requests.
- Runtime Monitor will poll periodically, so having single rest endpoint helps in terms of there will be less requests to be served by web server running in Heartbeat Handler.
Cons:
- Load balance among all the topics such that recent information needs to be provided to Runtime Monitor with very little delay. So
Approach #2
API changes
New Programmatic APIs
New Java APIs introduced (both user facing and internal)
Deprecated Programmatic APIs
New REST APIs
Path | Method | Description | Request Body | Response Code | Response |
---|---|---|---|---|---|
/v3/namespaces/{namespace}/programs/status | GET | Returns list of status messages for all the programs for a given namespace | batchsize, start_offset | 200 - On success 204 - No content 500 - Any internal errors | |
Deprecated REST API
Path | Method | Description |
---|---|---|
/v3/apps/<app-id> | GET | Returns the application spec for a given application |
CLI Impact or Changes
- Impact #1
- Impact #2
- Impact #3
UI Impact or Changes
- Impact #1
- Impact #2
- Impact #3
Security Impact
What's the impact on Authorization and how does the design take care of this aspect
Impact on Infrastructure Outages
System behavior (if applicable - document impact on downstream [ YARN, HBase etc ] component failures) and how does the design take care of these aspect
Test Scenarios
Test ID | Test Description | Expected Results |
---|---|---|
Releases
Release X.Y.Z
Release X.Y.Z
Related Work
- Work #1
- Work #2
- Work #3