Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Task marked incomplete

Table of Contents

Overview

This document captures the design of enhancements to data discovery in 4.0. Its main goal is to serve the Listing Center Home Page of CDAP 4.0.

...

  •  User stories documented (Bhooshan)
  •  User stories reviewed (Nitin)
  •  User stories reviewed (Todd)
  •  Requirements documented (Bhooshan)
  •  Requirements Reviewed (Nitin/Todd)
  •  Design Documented (Bhooshan
  •  Design Reviewed (Andreas/Terence/Poorna)
  •  Implementation
  •  Documentation
  •   

...

Requirements

The main requirements influencing these enhancements are:

...

Most research indicates feature parity between the two options, although Elasticsearch seems to have better REST API and JSON support. However, being that Apache Solr is more favored in Hadoop-land (supported by more distributions, is the only search engine that Cloudera supports, and has support in Slider to run on YARN), it makes more sense as the first candidate for supporting a search backend. The search backend, however, can be made pluggable (as an extension loaded using its own classloader using an SPI), so it could be swiped out for ElasticSearch if users wish to in future.

...

The response would contain 2 fields, other than the above input parameters:

  1. results - Contains a set of search results matching the search query
  2. total - specifies the total number of matched entities. This can be used to calculate the number of pages.

...

Code Block
$ curl http://localhost:11015/v3/namespaces/default/metadata/search?offset=50&size=2
{
  "sort": "name asc,created_time desc",
  "offset": 141,
  "size": 10,
  "total": 142,
  "results": [
    {
      "entityId":{
         "id":{
            "applicationId":"PurchaseHistory",
            "namespace":{
               "id":"default"
            }
         },
         "type":"application"
      },
      "metadata":{
         "SYSTEM":{
            "properties":{
               "Flow:PurchaseFlow":"PurchaseFlow",
               "MapReduce:PurchaseHistoryBuilder":"PurchaseHistoryBuilder"
            },
            "tags":[
               "Purchase",
               "PurchaseHistory"
            ]
         }
      }
    },
    {
      "entityId":{
         "id":{
            "instanceId":"history",
            "namespace":{
               "id":"default"
            }
         },
         "type":"datasetinstance"
      },
      "metadata":{
         "SYSTEM":{
            "properties":{
               "type":"co.cask.cdap.examples.purchase.PurchaseHistoryStore"
            },
            "tags":[
               "history",
               "explore",
               "batch"
            ]
         }
      }
    }
  ]
}

...

For 2 and 3, there could be an alternative to provide a UI-only (non-documented) batch endpoint.

 

 

 

 

...

Dataset Types in Metadata System

Currently, the Metadata System only supports artifacts, applications, programs, datasets, streams and stream views as entities. Is support for dataset types and modules necessary for 4.0?