Goals:

Improve operability in the Hydrator Studio (Improvements to logs, metrics, debuggability)
Improve usability in the Hydrator Studio (Redesign of bottom panel, etc)

Checklist

User stories documented (Bhooshan)
User stories reviewed (Nitin/SreeJon)
Design documented (Bhooshan/Brady)
Design reviewed (Nitin/SreeJon)
Implementation review (Bhooshan)

Use Cases:

Use Case 1: Improve Log Viewer

Problems with current Log Viewer:

Doesn't cater to usual developer interactions with logs - tail'ing (with log file monitoring -f) or less'ing (viewing the log) or downloading
Hard to distinguish between two log lines
Exception stack traces are virtually un-readable
Virtually no formatting in the UI - almost rendered as the logs appear from the backend, which is not ideal to an end-user
No search (even at the UI level)
No way to download logs
No way to distinguish whether logs are live or past
Jira Legacy
server Cask Community Issue Tracker
serverId 45b48dee-c8d6-34f0-9990-e6367dc2fe4b
key CDAP-5733

...

As a Hydrator/CDAP user, I want to be able to view my pipeline logs from both currently running pipelines as well as past pipelines to effectively debug the pipeline during failures
As a Hydrator/CDAP user, I want to clearly know if the logs I am viewing are being updated live or are from a past run
As a Hydrator/CDAP user, I want greater emphasis on the most important part of logs - the messages
As a Hydrator user, I do not want logs to be flooded with stack traces. I want the ability to suppress them individually and as a whole
As a Hydrator/CDAP user, I want the ability to download complete log files
As a Hydrator/CDAP user, I want to view a summary of the logs I'm viewing (the number of messages, the number of errors, the number of warnings)
As a Hydrator/CDAP user, I want to be able to filter logs by the lowest log level
As a Hydrator/CDAP user, I want to be able to filter logs by keywords
As a Hydrator/CDAP user, I want to be able to view a larger number of log events with a single-line summary for each, with the capability to drill down into particular events as desired
As a Hydrator/CDAP user, I want to be able to view logs in the selected time range. I want to be able to dynamically change the time range (only start time) for which I want to view logs, with context about how that time range maps to the duration of the program/service run.
As a Hydrator/CDAP user, I want to be able to be able to maximize the log viewer to full screen size and restore it to original size as required.

Possible solutions

...

Design:

Timeline:
1. Starts at the program/service start time. Ends at the program/service end time (past) or current.
2. Time range indicated by two sliders on each side. Time range can be selected by sliding these sliders.
3. Updating slider position causes a refresh of the log viewer to show logs in the selected range with the selected filters
4. If program/service is still running, the right/bottom end of the slider indicates current time, and if the slider is at this position, logs are updated live. The timeline keeps updating to reflect that.
5. Sliders must not cross each other
6. Label on the selected time range indicates the selected time range
7. The timeline is marked with time range with granularity that depends on the duration of the log (which is the duration of the program run).
8. In the selected (or default time range), there should be symbols on the time line for the errors and warnings, as well as for events that match the filter in the search box. Clicking on such a symbol should navigate you to the corresponding event in the table. The graph may look like so:
  Image Added
Filters:
1. Filter by lowest log level:
  1. If ERROR is selected, then we show only ERROR
  2. If WARN is selected, then we show ERROR and WARN
  3. If INFO is selected, then we show ERROR, WARN and INFO
  4. If DEBUG is selected, then we show ERROR, WARN, INFO and DEBUG
  5. If TRACE is selected, then we show ERROR, WARN, INFO, DEBUG and TRACE
2. Filter by search keywords:
  1. Search box that filters logs by the search text.
  2. This is a simple filter that applies on the message column
Log viewer Table:
1. Columns:
  1. Timestamp
  2. Lowest Log Level
  3. Source - Only in CDAP - This column should not be shown in Hydrator
  4. Message (also contains stack trace).
2. Default view shows single line messages, with / buttons to expand individual messages if they have more content
3. Ability to suppress/show stack trace with a similar / buttons.
4. Ability to expand all messages
5. Ability to only view the message column
Top Bar:
1. Shows information/summary of the log
2. Indicates program/service name
3. Summary of total messages with number of warnings and errors
4. Download button to download entire log
5. Search box for filtering.

Required Backend support:

Jira Legacy

server	Cask Community Issue Tracker
serverId	45b48dee-c8d6-34f0-9990-e6367dc2fe4b
key	CDAP-5893

Use Case 2: Bottom Panel

Problems with current Bottom Panelbottom panel:

Constant back-and forth between DAG and bottom panel - click on a node, then view the bottom panel - not very intuitive
Reserved real-estate for configurations that are not commonly updated
Schema available in both bottom panel as well as the DAG
Reduced "prominance" for both the DAG as well as the bottom panel, since you're not using the full available space ever
Restricted space in the bottom panel for logs, pipeline configuration, node configuration, etc
Association between a DAG and its bottom panel is not always clear enough

...

As a Hydrator Product Team, I want to better plan the Hydrator real-estate so it is not statically allocated for configurations/views that are not commonly used/mandatory to be updated for creating pipelines

e.g. Pipeline configurations like post run actions, engine, schedule

As a Hydrator Product Team, I want to better design the Hydrator UI to lay more emphasis on the DAG
As a Hydrator user, I do not want to switch back-and-forth between the DAG and the bottom panel repeatedly for building my pipeline
- I should be able to provide node-level details right near the node
- I should be able to simultaneously view details for multiple nodes both while editing a pipeline as well as viewing it.
As a Hydrator user, I want to be able to build my pipeline incrementally. I want mandatory information to be more obvious.

Build the pipeline with mandatory fields only to start off
Incrementally add schedule, post run actions, etc

As a Hydrator Product Team, I want remove to reduce the disparity between the pipeline detail view and the studio view. This will facilitate the move towards being able to edit a pipeline after publishing
- e.g. Reference is unavailable in the pipeline details view
As a Hydrator user, I want the messaging regarding multiple runs from the Hydrator UI to be clearer.

Does Hydrator only always show the last run?
If so, what is the "History" view for

As a Hydrator Product Team, I want to reduce duplication
- The console is not very useful today, it just shows messages. Can it be reconciled with the notification center?
As a Hydrator user, I want related actions to appear together.
- e.g. "Export" is available in the bottom panel, but other pipeline controls are in the top bar.
As a Hydrator Product team, I want to bring Jump buttons to Hydrator to make them the primary method of viewing entities in different contexts across CDAP, Hydrator and Tracker
- Jump from pipeline details view in Hydrator to program details view in CDAP
- Jump actions for source/sink in Hydrator:
  - View in Dataset Details page in CDAP
  - View in entity details page in Tracker
  - Explore Dataset (if possible) in CDAP

Use Case 2: Debuggability/Testing

User Stories:

Design:

Proposed Log Viewer:

Composed of two main views:

Viewing current logs along with monitoring (Live)
1. Similar to tail -f
2. Starts off with 50 lines
3. Shows newer logs as they become available towards the end
4. Users can see newer logs if they are 'scroll-positioned' at the last log line
5. Scroll position is retained if users are at any position other than the last log line
6. Previous button
Viewing logs within a specified time range (Not Live)
1. Similar in behavior to less, so its not live, but allows the following capabilities
2. Time range selector
3. Previous/Next buttons
4. Download button

Common to both views:

Compact view: A log line is a single line, so you see more logs (even though they are partial) at once.
Expanded view: A log line contains the entire content, including message and stack trace
Suppress stack trace: The stack trace in a log line can be suppressed by clicking something
1, 2, and 3 can be achieved either for all logs, or for an individual log line
Logs are tabular, consisting of columns: Timestamp, Log Level, Origin (includes thread name, class name and line number - but these can be split into separate columns if there is a requirement), Message (contains stack trace too).
Error/Warn level logs have some sort of highlighting (a symbol next to the log level?)
Log level column has a dropbox with checkboxes to select only a particular log level - ALL, DEBUG, INFO, WARN, ERROR
The message column can be expanded to the full width of the table, thereby hiding other columns. This operation can be reversed.
Search box that allows filtering log lines with the search text

Backend support:Design:

The basic premises for this design are:

Make DAG the hero
Not have widgets that occupy real-estate but only show messages like "No XXX for this XXX"
Not occupy real-estate statically with capabilities that are add-ons but not requirements for creating pipelines
Support incremental pipeline development:

Basics first: Configure nodes and get the pipeline working
Add-ons later: Adding a schedule, adding post-run actions, etc

Clearly demarcate Studio/Detail page into two sections:

Canvas: Has the DAG, and capabilities to modify/view/update/reference node level information
Pipeline section: Configure/update/view pipeline level information
Canvas occupies majority real-estate by default. If you want to view/modify pipeline details, that reduces canvas size

For pipeline section, there are two views for most capabilities:

Compact view: Shows the selected pipeline capability (logs, metrics, pipeline configuration) in a smaller drawer, but also shows the canvas (and the DAG).
Full-screen view: Hides the canvas, and only contains the header, footer and the selected capability.
Switching between these two views is supported

Reduce the disparity between the pipeline details view and the studio
- This will help to add the capability to edit a pipeline after publishing

Use Case 3: Debuggability/Testing/Preview

User Stories:

As a Hydrator user, I want to preview my pipeline with a specified set of input records fetched directly from the source for validating my pipeline configuration and behavior before I publish the pipeline.
(Advanced) As a Hydrator user, I want to be able to save snapshots of data previously fetched during preview, and use them later, so I don't have to connect to the source every time I want to preview my pipeline.
As a Hydrator user, I want to test individual plugins in my pipeline by sending in sample input records and viewing the resultant output.
(Advanced) As a Hydrator user, I want to be able to smoke test my pipeline from the UI

Provided with a golden set of input and output, the pipeline should be able to validate itself

(Advanced) As a Hydrator user, I want to be able to smoke test a single plugin from the UI

Provided with a golden set of input and output, the plugin should be able to validate itself

(Advanced) As a Hydrator user, I want to be able to store smoke test data for later use
As a Hydrator user, I want to be able to validate my pipeline more effectively
- If clicking on a 'Validate' button returns successful, then publishing the pipeline should not fail.

Design:

Use Case 4: Complex schema management

User Stories:

As a Hydrator user, I want to be able to set complex schemas for my pipeline
- I would like to have fields with enum, array, map, record and union types and would like an efficient method to create/manage them from the UI
As a Hydrator user, I would like to be able to view complex schemas for my pipeline

Design:

Use tabs with customized expansions for complex fields.
Simple types string, int, long, float, double, boolean, bytes can be defined as today
Enum: When an enum is selected, the field name becomes clickable. Expansion allows you to enter enum values.
Array: When an array is selected, the field name becomes clickable. Expansion accepts a data type (which in turn could be a complex type as well, in which case the same flow rules defined here would apply).
Map: When a map is selected, the field name becomes clickable. Expansion accepts a key type and a value type.
Record: When a record is selected, the field name becomes clickable. Expansion allows you to specify a nested record
Union: When a union is selected, the field name becomes clickable. On clicking, you can add schemas. Each schema is of type record.
For viewing, main screen only shows first level (string, int, long, float, double, boolean, bytes, enum, array, map, record and union) data types. For complex types, field names are clickable, and expand to read-only views of the expansions described above.

Use Case 5: Pipelines Listing or Pipelines Dashboard

User Stories:

As a Hydrator operations team member, I would like to view all the pipelines running across multiple namespaces that I am authorized for
- Would also like to be view only the pipelines that I have access for.
As a Hydrator operations team member, I would like to see following fields for each pipeline
- Pipeline Fields
  - Namespace
  - Pipeline Name
  - Status (RUNNING | SUCESS | FAILED)
  - Number of Active Runs
  - Total Number of Runs
  - Last Start Time
  - Last Run Finish Time
  - User
As a Hydrator operations team member, I should be able to filter on above specified fields to get to the pipeline status
As a Hydrator operations team member, I should be able to filter all the pipelines (including pipelines across namespace) based on Start date, End date and Status
As a Hydrator operations team member, I should be able to filter on a specified field and continue to filter the results using other fields. (nested filtering)

Use Case 6: System Mode

User Stories:

As a Hydrator user, I want to know if I'm working in distributed or stand-alone mode at all times.
As a Hydrator user, I want to know if I'm working in secure or secure mode at all times.

Design:

Use Case 7: Join Node

Jira Legacy

server	Cask Community Issue Tracker
serverId	45b48dee-c8d6-34f0-9990-e6367dc2fe4b
key	CDAP-6371

Use Case 8: Run Configuration

Jira Legacy

server	Cask Community Issue Tracker
serverId	45b48dee-c8d6-34f0-9990-e6367dc2fe4b
key	CDAP-

5893

6370

Scratch Pad:

Work Streams:

Tech Debt

Simplify Config Store
Simplify DAG component ~ Ajai's hack

Moving hard-coding/logic to backend

Drafts
Default plugin version
For a stage, define whether it can accept an Input, Output or both
Single APIs for status/logs/metrics for hydrator pipelines

New features

Preview
Log Viewer

Possible solutions for Log Viewer

Tabular view: Columns for date, Class Name/Thread Name, Log Level, Log Line
Alternate row background colors
Vertically expandable with scrolling
Searchable (Filter-able) columns
Clear demarcation of rows
Snippet with expand - especially for stack traces
Picking only 1 or more log level -INFO, DEBUG, WARN, ERROR, ALL
Ability to view and download raw logs if required
Ability to view and expand only the "content" column of a log line

Versions Compared

Old Version 28

New Version Current

Key

Table of Contents

Goals:

Checklist

Use Cases:

Use Case 1: Improve Log Viewer

Use Case 2: Bottom Panel

Design:

Proposed Log Viewer:

Use Case 3: Debuggability/Testing/Preview

Use Case 4: Complex schema management

Use Case 5: Pipelines Listing or Pipelines Dashboard

Use Case 6: System Mode

Use Case 7: Join Node

Use Case 8: Run Configuration

Scratch Pad:

Work Streams:

Page Comparison

Versions Compared

Old Version 28

New Version Current

Key

Goals:

Checklist

Use Cases:

Use Case 1: Improve Log Viewer

Use Case 2: Bottom Panel

Design:

Proposed Log Viewer:

Use Case 3: Debuggability/Testing/Preview

Use Case 4: Complex schema management

Use Case 5: Pipelines Listing or Pipelines Dashboard

Use Case 6: System Mode

Use Case 7: Join Node

Use Case 8: Run Configuration

Scratch Pad:

Work Streams: