Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

 

Goals

  • CDAP and CDAP Applications have the ability to withstand short and transient infrastructural outages
  • During interruption of underlying services (one or more), CDAP or CDAP Applications can operate under degraded performance/limited functionalities 
    • Users will not be able to perform operations like deploying apps, starting programs or new data or application lifecycle operations.
      • However all the applications that are running, should be running
  • Once interruption in the underlying service is resolved or services come back to normal operation, the CDAP and CDAP Application will go back to normal state 
  • Interruptions in service would be due to node failure, service failures or compatible rolling upgrades or downgrades in progress
  • Does not include in-compatible upgrades or downgrades of underlying infrastructure 
  • Does not include long unavailability of service and infrastructure

Open Item/Discussion point

  • Define long and short/transient outages 

Infrastructure components used by Cask Data Application Platform (CDAP)

Following are the underlying infrastructure components used by CDAP and/or CDAP Applications running in CDAP.  The components presented below are in no priority order. 
  • HDFS
  • HBase
  • Hive
  • Kafka
  • YARN and
  • Zookeeper
  • KMS

Functional use of infrastructure components

This section provides information about how and for what the components underneath are used. 
HDFS
  • CDAP Stream
  • Apache Tephra WAL
  • Deployed Application Artifact and Dataset Artifact
  • Aggregated Logs
  • CDAP Fileset Dataset
  • YARN distributed cache 
  • Coprocessor jars 
HBase
  • CDAP System data/metadata (ex: Preferences, Application, Namespace, Artifact…)
  • Metrics Cube
  • Lineage
  • Workflow Statistics
  • Run Record and Statistics
  • Checkpoint information
  • CDAP Table Dataset

Kafka
  • Logs
  • Metrics
  • Audit Logs (Will be moved to HBase in 4.0)
  • Metadata updates (Will be moved to HBase in 4.0)
  • Notifications (Will be moved to HBase in 4.x)

YARN
  • System Services
  • User applications

Zookeeper
  • Routing Tables
  • Coordination
  • Secret keys 
    • Auth keys

Hive
  • Dataset integration 
    • Schema
    • Properties
    • Serde
KMS
  • User Secrets (Ex: Password, access tokens etc..) 

Failure Scenarios

  • HDFS
    • Upgrade
    • Downgrade
    • Restart
    • Data Node Outage
  • HBase
    • Upgrade
    • Downgrade
    • Restart
    • Region Server Outage
  • Zookeeper
    • Upgrade
    • Downgrade
    • Network Partition 
  • YARN
    • Upgrade
    • Downgrade
    • Node Manager Outage
    • RM Outage
  • Kafka
    • Upgrade
    • Downgrade
    • Disk Outage
  • KMS
    • Upgrade 
    • Downgrade 
    • Outage

Initiatives In Progress

  • [3.6] CDAP Service version and upgrade support
  • [3.6] Application versioning
  • [4.0] Messaging Service with goal of centralizing all transactional activities for metadata in HBase
  • [4.0?] Non-Transactional datasets 
  • [4.0?] HBase Coprocessor Upgrade Management — Handling minor version changes efficiently without disabling HBase Tables. 
  • [4.0?] Upgrade tool improvements — Coprocessor Upgrade removal, faster data conversions if needed, smarts to reduce the impact to running services
  • [4.0?] CDAP Service Upgrade capability, might have Apache Twill change
  • [4.0?] Move configuration and operational updates to messaging services

Initiatives In Plan

  • Clients have retry and back-off mechanism to operate in degraded mode
  • YARN application resilience through Apache Twill
  • Move Dataset Service that currently runs in Master as YARN Application
  • No labels