Operations Dashboard
API Requirements
Graph
Information Provided:
- List of namespaces
- Start Time
- End Time
- Time Resolution
Information Needed:
- Memory Usage over time per namespace, cluster, and max available
- Core Usage over time per namespace, cluster, and max available
- Bucketed over time resolution aggregate. (The aggregate, we should be able to identify pipeline vs custom app):
- Manual start
- Scheduled start
- Status (RUNNING, COMPLETED, FAILED)
- Delay between STARTING and RUNNING
- If start time and end time is for future date, show the scheduled apps
Â
Details when Graph Time Range is Clicked
Information Provided:
- List of Namespace
- Start Time
- End Time
Information Needed:
- Entity Details:
- Namespace
- App Name
- Program Type
- Program Name
- Parent Artifact
- Duration
- User
- Start Method (time schedule, trigger, manual)
- Status
Â
Â
Reports View
Information Provided:
- List of namespaces
- List of statuses
- Start Time
- End Time
Â
Information Needed:
- Entity Details:
- Namespace
- App Name
- Program Type
- Program Name
- Parent Artifact
- Duration
- User
- Start Method
- Status
- Runtime Arguments
- Memory Usage
- Number of CPU
- Number of Containers
- Number of Log Warnings
- Number of Log Errors
- Number of records out
- Summary Counts:
- Runs per namespace
- Time range
- Pipelines (Realtime vs Batch), custom apps
- Durations: min, max, average
- Last Started: Oldest and Newest
- List of users & count per user
- List of start method & count per methods
Â
Â
Answered Questions:
1. For older version of CDAP that gets upgraded to 5.0.0 that doesn’t have some information (ie. program start methods, program parent artifact), those information won't be shown and will be displayed as unknown.
2. Future timeline (design should get updated, grey out the statuses and manually started in graph). Only Time trigger schedules will be displayed.
4. How should the runs list be displayed, Batch vs Realtime vs Custom Apps (collapsed by workflow? What about if the programs started outside workflow?): at the frontend users can choose to expand the custom app to show details of different programs in the app.
5. In Dashboard view, we need to limit the time window to a fixed range such as 24 hours in order to display at realtime.Â
6. After user selects the options and click generate report, a (Spark?) job will be launched. If the job takes less than a specific time (10 sec?) to finish, UI will directly display the report. Otherwise, UI will ask user to wait for the report. When the job finishes, a permalink will be produced and it will be only accessible by the user who generated it. If the user chooses to share the report with others, a different link will be generated that will be viewable by other users.
7. The report will only contain programs that are readable to the user who generates the report.
Action Items:
1. Feasibility of features (core & memory usage, start methods for programs): Need to modify TWILL ApplicationMaster to get containers information. For MapReduce and Spark, how to get containers info is TBD.Â
2. Need to clarify in the Memory Usage chart, what's the difference between Namespace(s) Usage and App Usage
3. When zooming in to resolution of an hour, can multiple hours be selected? In each row, what are Detail and Summary?Â
4. Is it feasible to get resource usage for each namespace?