Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

Checklist

  • User Stories Documented
  • User Stories Reviewed
  • Design Reviewed
  • APIs reviewed
  • Release priorities assigned
  • Test cases reviewed
  • Blog post

Introduction 

The home page of the CDAP UI shows all the available entities in CDAP, along with the ability to narrow them using searching and filtering. The backend search API should support sorting and pagination for this view.

Goals

  1. Allow users to specify a field to sort by as well as a sorting order
  2. Support only name and creation-time as the sort fields for now
  3. Support sort order (ascending and descending)
  4. Support only one combination of sort parameters per request
  5. Support pagination by way of offset and limit  parameters.

User Stories 

  • As a CDAP user, I would like to sort entities on the home page by specifying a sort field and a sort order
  • As a CDAP user, I would like to paginate search results on the home page for easy navigation. I'd like to specify the page size, and be able to jump to a page from the specified options.

Design

The design for this feature can be broken down into the following high-level (ideal) steps, for a given search query, sort order, sort field, offset and limit:

  1. Fetch all the search results
  2. Sort them by the specified sort field in the specified sort order
  3. In the ordered search results, pick items between indexes offset and offset + limit.

The major problems with this approach are that step 1 could return a large amount of results, thus making it less performant to do steps 2 and 3 in-memory. In some more detail, the problems with this approach:

Sorting: We would like to specify a sort field and sort order. However, given that scans in HBase are only lexicographically ordered by the rowkey, we would need to have rowkeys containing each of the supported sort fields, leading to duplicate storage.

Pagination: Ideally, we would like to only fetch results between offset and offset + limit from the storage (HBase). However, given that in HBase you cannot specify a scan to start at the nth row, this is not possible.

Another major problem is for complex queries. CDAP currently splits queries by special characters (such as whitespace), searches for each of the components and then weighs results based on how many components were contained in them, then returns the search results in the descending order of weights. To assign weights, the algorithm needs a view into the entire search results, hence pagination cannot be done before assigning weights.

Approach

Approach #1

This approach is similar to how pagination is done for logs.

  1. Maintain a startRow. If offset is 0, start row is the first row, else start row is part of the previous search response as the parameter cursor.
  2. Fetch limit number of results from the offset using PageFilter.
  3. In the response, return the rowkey of the last result as the cursor.

Limitations of this approach:

  1. Depends on natural ordering from HBase for sorting. Hence, the storage footprint would increase, since we'd have to store duplicate rows for each sort field.
  2. The algorithm to assign weights will not be correct, since we only do a partial fetch from HBase.
  3. This algorithm works well for pagination when the usecase is to always return the next n results, like logs. When you want to jump to pages, or go back a page, this algorithm will not work

Approach #2

To counter the problems specified above, the following algorithm is proposed:

  1. Define an in-memory constant: DEFAULT_SEARCH_FETCH_SIZE. Say this is 1000.
  2. If offset+limit > DEFAULT_SEARCH_FETCH_SIZE, fetch size will be DEFAULT_FETCH_SIZE + offset + limit
  3. To assign weights correctly, return fetch size number of search results for each component of the search query.
    1. So, for the query purchase dev history, this step will fetch 3 X the fetch size
  4. Sort the response of step 3 by the specified sort field in memory with the specified order
  5. Return the results in between offset and limit from 4.

API changes

New Programmatic APIs

N/A

Deprecated Programmatic APIs

N/A

Updated REST APIs

PathMethodDescriptionResponse CodeResponse
/v3/namespaces/<namespace-id>/searchGET

Additional parameters will be added for this API:

  1. sortField
  2. sortOrder
  3. offset
  4. limit

 

200 - On success

404 - When application is not available

500 - Any internal errors

 

     

Deprecated REST API

PathMethodDescription
/v3/apps/<app-id>GETReturns the application spec for a given application

CLI Impact or Changes

  • Impact #1
  • Impact #2
  • Impact #3

UI Impact or Changes

  • The UI was already updated in the 4.0 preview release to send the new parameters. There may be minor updates during implementation

Security Impact 

The search API already returns only authorized results, there will be no changes to that.

Impact on Infrastructure Outages 

System behavior (if applicable - document impact on downstream [ YARN, HBase etc ] component failures) and how does the design take care of these aspect

Test Scenarios

Test IDTest DescriptionExpected Results
   
   
   
   

Releases

Release X.Y.Z

Release X.Y.Z

Related Work

  • Work #1
  • Work #2
  • Work #3

 

Future work

  • No labels