Terminology | Definition |
---|
CDAP Active/Standby | - CDAP is only running in one cluster (active cluster)
- CDAP in all other clusters shouldn't be running (standby cluster)
- User applications can only be running in the active cluster
- Data is being replicated via means outside of CDAP control
- HBase replication
- HDFS copy
- Kafka mirror-maker
- Data is available on all clusters
- Data can be read on Standby clusters outside of CDAP
- High level failover steps
- Stop all running apps in active cluster
- Stop CDAP in active cluster
- Wait for all data replication to settle
- Check via tools
- Pick a standby cluster to be the next active cluster and start CDAP
- Start applications on the new active cluster
- Strictly speaking, CDAP is not aware of the replication at all
- Since CDAP is not aware of the replication, all data needs to be replicated.
- Otherwise inconsistency could occur when restarting CDAP on a different cluster than the previously active one.
- Already doable in CDAP 3.5
- CDAP 4.1 added extra API and tools to assist
- API for externalizing HBase table creation
- Tools for checking replication status
|
Hot/Cold replication | Same as CDAP Active/Standby |
Active/Passive replication | Same as CDAP Active/Standby |
CDAP Active/Active | - CDAP is running in all clusters
- CDAP is aware of the state replications between all CDAP instances
- Relies on external means for data replication
- CDAP system tables in HBase
- Kafka topics for log collection
- User data replication is outside of CDAP control
- Depends on what user applications use
- E.g. HBase replication, HDFS copy and Kafka mirror-maker
- It is still active/standby from the application point of view
- User can declares declare which cluster is the active one at the namespace level
- User can declares declare which namespace need replications needs replication and which one does not
- User can change the active cluster for a namespace
- To switch the active cluster, the following will happen
- Stopping all running applications in that namespace in the active cluster
- Pick another cluster as the new active cluster for the namespace involved
- Start applications in that namespace again in the new active cluster
- CDAP will provide easy switch to perform the three steps described above on behalf of the user
|
Hot/Hot replication | Same as CDAP Active/Active |
CDAP Active/Active with Application Master/Slaves | - Have everything described in CDAP Active/Active
- User data is still being replicated via means outside of CDAP control
- For a namespace
- The active cluster for that namespace is the "master"
- Writes only happen in the master cluster
- All other clusters are the "slave" clusters
- Receives user data updates from the "master" cluster via replication
- Can start applications in that namespace for "read-only" operation
- The same application that is already running in the "master" cluster cannot be started on any "slave" clusters
|