Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Current »

Infrastructure components used by Cask Data Application Platform (CDAP)

Following are the underlying infrastructure components used by CDAP and/or CDAP Applications running in CDAP.  The components presented below are in no priority order. 
  • HDFS
  • HBase
  • Hive
  • Kafka
  • YARN and
  • Zookeeper
  • KMS
  • Sentry ???

Functional use of infrastructure components

This section provides information about how and for what the components underneath are used. 
HDFS
  • CDAP Stream
  • Apache Tephra WAL
  • Deployed Application Artifact and Dataset Artifact
  • Aggregated Logs
  • CDAP Fileset Dataset
  • YARN distributed cache 
  • Coprocessor jars 
HBase
  • CDAP System data/metadata (ex: Preferences, Application, Namespace, Artifact…)
  • Metrics Cube
  • Lineage
  • Workflow Statistics
  • Run Record and Statistics
  • Checkpoint information
  • CDAP Table Dataset
Kafka
  • Logs
  • Metrics
  • Audit Logs (Will be moved to HBase in 4.0)
  • Metadata updates (Will be moved to HBase in 4.0)
  • Notifications (Will be moved to HBase in 4.x)
YARN
  • System Services
  • User applications
Zookeeper
  • Routing Tables
  • Coordination
  • Secret keys 
    • Auth keys
Hive
  • Dataset integration 
    • Schema
    • Properties
    • Serde
KMS
  • User Secrets (Ex: Password, access tokens etc..) 
  • No labels