Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. HDFS:
    1. Hadoop Distcp is a tool used for large inter/intra-cluster copying. It uses MapReduce to effect its distribution, error handling and recovery, and reporting. It expands a list of files and directories into input to map tasks, each of which will copy a partition of the files specified in the source list.
    2. Hadoop Distributed Copy Command: http://hadoop.apache.org/docs/r1.2.1/distcp2.html

    3. Cloudera Ditscp Distcp page: https://www.cloudera.com/documentation/enterprise/5-6-x/topics/cdh_admin_distcp_data_cluster_migrate.html

    4. HortonWorks: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.3/bk_Sys_Admin_Guides/content/using_distcp.html


  2. HBase:
  3. Kafka:
  4. FileSets

...