Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Task marked incomplete

Goals: 

  • CDAP contains multiple entities - for ex, Namespaces, Applications, Programs, Datasets (there could also be fine-grained entities such as Partitions in a PFS Dataset or the fields in a Table Dataset).
    We have system and business metadata for each of these entities. We should be able to push this data to external Metadata management systems, such as, Cloudera Navigator, Apache Atlas etc, henceforth referred to as MDM. 

...

  •  User stories documented (Gokul)
  •  User stories reviewed (Nitin)
  •  Design documented (Gokul)
  •  Design reviewed (Andreas)
  •  Feature merged (Gokul)
  •  Examples and guides (Gokul)
  •  Integration tests (Gokul) 
  •  Documentation for feature (Gokul)
  •  Blog post (Gokul)


User Stories:
 

  • CDAP business and system metadata entities should automatically show up in MDM
  • CDAP user should be able to search for CDAP business and system metadata using MDM
  • Any updates/deletes to system or business metadata in CDAP should automatically reflect in MDM
  • Users should be able to search on dataset or streams schema fields (fine-grained entities) in MDM
  • Existing metadata (data that existed before MDM integration was enabled) should also be made available in MDM (depends on whether messages are available in Kafka) (Low priority)
  • Updates/deletion of custom metadata in MDM should be reflected in CDAP (Low priority)
  • Advanced User Requirement:

    Pushing business metadata of CDAP entities to

    lower level systems

    underlying entities - For example, if a CDAP Table dataset is marked as ‘sensitive’, this tag should be pushed to the corresponding HBase Table created by CDAP (Low priority) 

Design:

Technical Constraints

...

for Cloudera Navigator

Navigator, currently, pulls in data periodically from different Hadoop components - HDFS, Hive etc. It uses Solr for indexing. But Navigator does provide a simple Java Client to set and query metadata.
Though it is limited in its features, it can potentially be used to push custom metadata for entities to Navigator. But there few known and unknown issues:
 

...