This is a major undertaking to improve resiliency, support rolling upgrades / downgrades, handle cross cluster or cross data center replication and also improving performance of certain components. This area will act as a central place for all the requirements gathered, design documents and other technical discussions related to this topic.Â
- Start Release : 3.6
- Working Group Mailing List :Â
- Slack Channel : tyche-group.slack.com
- Approximate Timeline : Best-case : 6 months, Worst-case : 10 months
- Primary Technical Contact :Â Terence Yim (Unlicensed)
- Primary Program Contact :Â priyanambiar
Here we will define different tracks that are in part of this initiative and break it down into multiple releases. Please note the problems that are being solved as part of this initiative are some of the challenging and complex problems. So, they are been broken down into multiple releases for each track.
Legend
- Â Link to design document
- Â Associated JIRA on Cask issue or Apache project issue system
Â
Terminology | Definition |
---|
CDAP Active/Standby | - CDAP is only running in one cluster (active cluster)
- CDAP in all other clusters shouldn't be running (standby cluster)
- User applications can only be running in the active cluster
- Data is being replicated via means outside of CDAP control
- HBase replication
- HDFS copy
- Kafka mirror-maker
- Data is available on all clusters
- Data can be read on Standby clusters outside of CDAP
- High level failover steps
- Stop all running apps in active cluster
- Stop CDAP in active cluster
- Wait for all data replication to settle
- Check via tools
- Pick a standby cluster to be the next active cluster and start CDAP
- Start applications on the new active cluster
- Strictly speaking, CDAP is not aware of the replication at all
- Since CDAP is not aware of the replication, all data needs to be replicated.
- Otherwise inconsistency could occur when restarting CDAP on a different cluster than the previously active one.
- Already doable in CDAP 3.5
- CDAP 4.1 added extra API and tools to assist
- API for externalizing HBase table creation
- Tools for checking replication status
|
Hot/Cold replication | Same as CDAP Active/Standby |
Active/Passive replication | Same as CDAP Active/Standby |
CDAP Active/Active | - CDAP is running in all clusters
- CDAP is aware of the state replications between all CDAP instances
- Relies on external means for data replication
- CDAP system tables in HBase
- Kafka topics for log collection
- User data replication is outside of CDAP control
- Depends on what user applications use
- E.g. HBase replication, HDFS copy and Kafka mirror-maker
- It is still active/standby from the application point of view
- User can declare which cluster is the active one at the namespace level
- User can declare which namespace needs replication and which one does not
- User can change the active cluster for a namespace
- To switch the active cluster, the following will happen
- Stopping all running applications in that namespace in the active cluster
- Pick another cluster as the new active cluster for the namespace involved
- Start applications in that namespace again in the new active cluster
- CDAP will provide easy switch to perform the three steps described above on behalf of the user
|
Hot/Hot replication | Same as CDAP Active/Active |
CDAP Active/Active with Application Master/Slaves | - Have everything described in CDAP Active/Active
- User data is still being replicated via means outside of CDAP control
- For a namespace
- The active cluster for that namespace is the "master"
- Writes only happen in the master cluster
- All other clusters are the "slave" clusters
- Receives user data updates from the "master" cluster via replication
- Can start applications in that namespace for "read-only" operation
- The same application that is already running in the "master" cluster cannot be started on any "slave" clusters
|
Reference Documents
- Functional Components and their impacts – here
- RAFT Protocol – here
- Cloudera Software Integration Guide – hereÂ
Â