Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Next »

Goals

 

Reference

 

Requirements

  • Support Active-Active and Active-Passive configuration
  • Provide tool or status on whether the replication is complete or is in a safe state
  • Support the ability to replicate HBase DDL to remote cluster – support creation of tables dynamically
  • Handle Kafka offset management across multiple clusters (Shortcoming of Mirror Maker)
  • Support replication of routing configuration stored in Zookeeper to remote cluster

Replications:

  1. HDFS:
    1. Hadoop Distcp is a tool used for large inter/intra-cluster copying. It uses MapReduce to effect its distribution, error handling and recovery, and reporting. It expands a list of files and directories into input to map tasks, each of which will copy a partition of the files specified in the source list.
    2. Hadoop Distributed Copy Command: http://hadoop.apache.org/docs/r1.2.1/distcp2.html

    3. Cloudera Distcp page: https://www.cloudera.com/documentation/enterprise/5-6-x/topics/cdh_admin_distcp_data_cluster_migrate.html

    4. HortonWorks: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.3/bk_Sys_Admin_Guides/content/using_distcp.html

    5. How to iteratively copy data?

    6. What is data quantum  to copy data iteratively. 

    7. distcp allows an option to copy files, could we copy individual files at certain time boundaries ? End of each day ?
  2. HBase:
    a. HBase Supports replication to multiple clusters in multiple topologies. Documentation: http://hbase.apache.org/book.html#_cluster_replication
    b. How to check Replication is complete when customer is ready to switch over the cluster: 
    1. Check if this replication metric can be used to determine the above: 
      1. source.sizeOfLogQueue

        number of WALs to process (excludes the one which is being processed) at the Replication source



  3. Kafka:
  4. FileSets

 

 

Challenges

 

Open Questions

  • No labels