Resiliency, Replication, Rolling Upgrade, Performance


Introduction

This is a major undertaking to improve resiliency, support rolling upgrades / downgrades, handle cross cluster or cross data center replication and also improving performance of certain components. This area will act as a central place for all the requirements gathered, design documents and other technical discussions related to this topic. 

Initiative Details

  • Start Release : 3.6
  • Working Group Mailing List : 
  • Slack Channel : tyche-group.slack.com
  • Approximate Timeline : Best-case : 6 months, Worst-case : 10 months
  • Primary Technical Contact : Terence Yim (Unlicensed)
  • Primary Program Contact : priyanambiar

Tracks and Execution Layout

Here we will define different tracks that are in part of this initiative and break it down into multiple releases. Please note the problems that are being solved as part of this initiative are some of the challenging and complex problems. So, they are been broken down into multiple releases for each track.

Legend

  •   Link to design document
  •  Associated JIRA on Cask issue or Apache project issue system

 

Tracks

Release 1

Release 2Release 3Release 4Release 5
Status
StageDesign & Design ReviewsRequirement GatheringRequirement GatheringRequirement GatheringRequirement Gathering
Data Replication
  • RS-016 - Application Versioning –
  • HBase DDL SPI –
  • Handling Kafka offset mis-matches –
  • Replication status tool (Design WIP) –
  • Transactional Messaging System –
  • Metadata updates via Transactional Messaging
  • Design for distributed transactions
  • Replication tools - phase 1

 

  • Program state change updates via Transactional Messaging
  • Distributed transactions implementation - phase 1
  • Replication of metadata
  • Replication tools - phase 2
  • Replication of program state changes
  • Distributed transactions implementation - phase 2
  • Design for namespace replication
  • Replication tools - phase 3
  • Namespace replication
  • Replication tools - phase 4
Disaster Recovery     
Resilience
  • RS-001 - Minimize interruption caused by update of coprocessor
  • RS-002 - Client Resiliency –  
  • RS-016 - Application Versioning –
  • RS-004 - CDAP version definition and guarantees of version
  • RS-002 - Client Resiliency
  • RS-003 - Make CDAP system services HA
  • RS-008 - Apache Twill Application rolling upgrade
  • RS-011 - Hydrator Pipeline upgrade
  • RS-013 - Test framework Chaos Monkey
  • RS-014 - User Interface/CLI
  • RS-010 - Progressive background upgrade tool
  • RS-005 - Internal Schema evolution and Management
  • RS-006 - Managing Infrastructure Incompatibility
  • RS-012 - Dataset Upgrade
  • RS-013 - Test framework Chaos Monkey
  • RS-014 - User Interface/CLI
  • RS-009 - Upgrade Orchestrator
  • RS-007 - System State Transition and Management
  • RS-006 - Managing Infrastructure Incompatibility
  • RS-012 - Dataset Upgrade
  • RS-015 - Support for Rollbacks
  • RS-014 - User Interface/CLI
  • RS-009 - Upgrade Orchestrator
  • RS-006 - Managing Infrastructure Incompatibility
  • RS-015 - Support for Rollbacks
  • RS-014 - User Interface/CLI
      
      
Performance & Scalability
  • P&S-001 - Invalid list pruning with major compaction –
  • P&S-002 - Tool to report progress on pruning and inspecting invalid list

  • P&S-004 - Transaction server per namespace for resource isolation
  • P&S-005 - Support for running multiple instance of transaction server in active-active mode
  • P&S-006 - Invalid list pruning with stripped compaction

 

What does this release provide ?

 

  • Dynamic loading of hbase compact modules within co-processor which will relive upgrade tool from disabling hbase tables
  • Handling resiliency in interacting with HBase and Zookeeper
  • Documentation describing how CDAP would version and guarantees that it would be providing in light of rolling upgrade and resiliency requirements being added to platform
  • No more manual process to cleaning up invalid transaction list; no more manual maintenance required
  • Improves throughput of transactional data operations
  • Reduce network traffic; reduce transaction metadata
    
User Stories Enabled by the Release
  • Hot-Cold Replication
    • Users should be able to set up hot - cold replication using CDAP
    • Users will be provided a tool to get status of HBase replication to determine when it is safe to switch the cluster from cold-hot
    • Users should be able to implement a hook to manage HBase DDL across replicated clusters
    • CDAP will provide capabilities to automatically handle Kafka mismatches
    

Terminologies 

TerminologyDefinition
CDAP Active/Standby
  • CDAP is only running in one cluster (active cluster)
    • CDAP in all other clusters shouldn't be running (standby cluster)
    • User applications can only be running in the active cluster
  • Data is being replicated via means outside of CDAP control
    • HBase replication
    • HDFS copy
    • Kafka mirror-maker
  • Data is available on all clusters
    • Data can be read on Standby clusters outside of CDAP
  • High level failover steps
    1. Stop all running apps in active cluster
    2. Stop CDAP in active cluster
    3. Wait for all data replication to settle
      1. Check via tools
    4. Pick a standby cluster to be the next active cluster and start CDAP
    5. Start applications on the new active cluster
  • Strictly speaking, CDAP is not aware of the replication at all
    • Since CDAP is not aware of the replication, all data needs to be replicated.
    • Otherwise inconsistency could occur when restarting CDAP on a different cluster than the previously active one.
  • Already doable in CDAP 3.5
  • CDAP 4.1 added extra API and tools to assist
    • API for externalizing HBase table creation
    • Tools for checking replication status
Hot/Cold replicationSame as CDAP Active/Standby
Active/Passive replicationSame as CDAP Active/Standby
CDAP Active/Active
  • CDAP is running in all clusters
  • CDAP is aware of the state replications between all CDAP instances
  • Relies on external means for data replication
    • CDAP system tables in HBase
    • Kafka topics for log collection
  • User data replication is outside of CDAP control
    • Depends on what user applications use
      • E.g. HBase replication, HDFS copy and Kafka mirror-maker
  • It is still active/standby from the application point of view
    • User can declare which cluster is the active one at the namespace level
    • User can declare which namespace needs replication and which one does not
    • User can change the active cluster for a namespace
      • To switch the active cluster, the following will happen
        1. Stopping all running applications in that namespace in the active cluster
        2. Pick another cluster as the new active cluster for the namespace involved
        3. Start applications in that namespace again in the new active cluster
      • CDAP will provide easy switch to perform the three steps described above on behalf of the user
Hot/Hot replicationSame as CDAP Active/Active
CDAP Active/Active with Application Master/Slaves
  • Have everything described in CDAP Active/Active
  • User data is still being replicated via means outside of CDAP control
  • For a namespace
    • The active cluster for that namespace is the "master"
      • Writes only happen in the master cluster
    • All other clusters are the "slave" clusters
      • Receives user data updates from the "master" cluster via replication
      • Can start applications in that namespace for "read-only" operation
        • The same application that is already running in the "master" cluster cannot be started on any "slave" clusters


Reference Documents
  • Functional Components and their impacts – here
  • RAFT Protocol – here
  • Cloudera Software Integration Guide – here 

Â