Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

HBase DDL

HBase DDL will require some hooks in CDAP, because replication must be setup for every table when it is created, and before any data is written to it. CDAP will define an interface to create, modify, and delete HBase tables. Instead of just creating a table in the local HBase instance, we need to create a table in both the master and slave instances and set up replication from the master to the slave. We can do this by introducing an SPI for HBase DDL operations, where the default implementation is the current single cluster implementation, and users can plug in their own implementation that creates tables and sets up replication as needed.

Java SPI

Code Block
/**
 * Executes HBase DDL operations.
 */
public interface HBaseDDLExecutor {

  /**
   * Create the specified namespace if it does not exist.
   *
   * @param name the namespace to create
   * @throws IOException if a remote or network exception occurs
   */
  void createNamespaceIfNotExists(String name) throws IOException;

  /**
   * Delete the specified namespace if it exists.
   *
   * @param name the namespace to delete
   * @throws IOException if a remote or network exception occurs
   * @throws IllegalStateException if there are tables in the namespace
   */
  void deleteNamespaceIfExists(String name) throws IOException;

  /**
   * Create the specified table if it does not exist.
   *
   * @param descriptor the descriptor for the table to create
   * @param splitKeys
   * @throws IOException if a remote or network exception occurs
   * @throws NotFoundException if the namespace for the specified table does not exist
   */
  void createTableIfNotExists(TableDescriptor descriptor, @Nullable byte[][] splitKeys) throws IOException;

  /**
   * Enable the specified table.
   *
   * @param namespace the namespace of the table to enable
   * @param name the name of the table to enable
   * @throws IOException if a remote or network exception occurs
   * @throws NotFoundException if the specified table does not exist
   */
  void enableTable(String namespace, String name) throws IOException;

  /**
   * Disable the specified table.
   *
   * @param namespace the namespace of the table to disable
   * @param name the name of the table to disable
   * @throws IOException if a remote or network exception occurs
   * @throws NotFoundException if the specified table does not exist
   */
  void disableTable(String namespace, String name) throws IOException;

  /**
   * Modify the specified table. The table must be disabled.
   *
   * @param namespace the namespace of the table to modify
   * @param name the name of the table to modify
   * @param descriptor the descriptor for the table
   * @throws IOException if a remote or network exception occurs
   * @throws NotFoundException if the specified table does not exist
   * @throws IllegalStateException if the specified table is not disabled
   */
  void modifyTable(String namespace, String name, TableDescriptor descriptor) throws IOException;
 
  /**
   * Truncate the specified table. The table must be disabled.
   *   
   * @param namespace the namespace of the table to truncate
   * @param name the name of the table to truncate
   * @throws IOException if a remote or network exception occurs
   * @throws NotFoundException if the specified table does not exist
   * @throws IllegalStateException if the specified table is not disabled
   */
  void truncateTable(String namespace, String name) throws IOException;

  /**
   * Delete the table if it exists. The table must be disabled.
   *
   * @param namespace the namespace of the table to delete
   * @param name the table to delete
   * @throws IOException if a remote or network exception occurs
   * @throws NotFoundException if the namespace for the specified table does not exist
   * @throws IllegalStateException if the specified table is not disabled
   */
  void deleteTableIfExists(String namespace, String name) throws IOException;
}
 
public class TableDescriptor {
  
}

The default implementation will simply use the existing HBaseTableUtil. There can be another implementation that makes REST calls for each method, leaving actual HBase operations and auth up to an external service. For example, an analagous RESTful API could be:

 

MethodPathRequest BodyDescription
PUT/namespaces/<namespace> create namespace if it doesn't exist. No-op if it already exists.
PUT/namespaces/<namespace>/tables/<table>HTableDescriptor contents, split keyscreate table if it doesn't exist. No-op if it already exists.
PUT/namespaces/<namespace>/tables/<table>/propertiesHTableDescriptor contentsmodify an existing table.
POST/namespaces/<namespace>/tables/<table>/enable enable an existing table.
POST/namespaces/<namespace>/tables/<table>/disable disable an existing table.
POST/namespaces/<namespace>/tables/<table>/truncate truncate an existing table.
DELETE/namespaces/<namespace> delete a namespace.
DELETE/namespaces/<namespace>/tables/<table> delete a table.

where the user is passed as request headers. Each endpoint must be idempotent, as there could be a failure in one or more HBase instances, but a success in another instance. In such cases, the client will retry the request, so the endpoint must be idempotent. A 200 should only be returned if the operation succeeded in all HBase instances.

Coprocessors

One difficulty will be in handling the coprocessor jar. Today, when a Table is being created, its coprocessor jar is also built and placed on HDFS.

One way to handle this is to send the jar contents as part of the table creation request (Base64 encoded for example). However, this would be an issue if the master and slave clusters are running different versions of HBase, which require different coprocessors. In order for it to work, we would have to somehow consolidate all coprocessors into a single one that works for all supported HBase versions. However, it doesn't seem like this is possible, as HBase offers no coprocessor compatibility guarantees across HBase versions.

Instead, each CDAP instance will include a tool that will pre-build a coprocessor jar and place it on HDFS in a pre-determined location. Instead of building the jar on demand, it will just always be present on hdfs.