RS-001 Coprocessor Rolling Upgrade

Checklist

  • User Stories Documented
  • User Stories Reviewed
  • Design Reviewed
  • APIs reviewed
  • Release priorities assigned
  • Test cases reviewed
  • Blog post

Introduction 

One of the reasons CDAP must be stopped before an upgrade is so that the upgrade tool can be run to update the coprocessors for all CDAP tables. In order to minimize downtime, we would like to be able to upgrade coprocessors in a rolling fashion.

Goals

Design a method to upgrade CDAP HBase coprocessors in a rolling fashion, with minimal downtime.

User Stories 

  • As a cluster administrator, I want to be able to upgrade CDAP coprocessors without stopping CDAP
  • As a cluster administrator, I want to be able to upgrade HBase without stopping CDAP

Design

Prior to 4.1.0, the way coprocessors are handled is that they are built and loaded onto hdfs when the dataset is created. When the HBase Table is created, it is configured with the hdfs path of the coprocessor(s), the classname of the coprocessor(s), and the priority. During a CDAP upgrade, CDAP is stopped, and an upgrade tool is run that loops through all tables, disables the table, builds and uploads the new coprocessor jars, modifies the table to point to the new coprocessor(s) on hdfs, then re-enables the table. This is nice in that CDAP manages coprocessors itself and cluster administrators don't need to know anything about coprocessors. It is not ideal in that it requires downtime in order to upgrade the coprocessor. 

Approach

We first describe the approach for CDAP rolling upgrade, assuming that no HBase upgrade is happening.

Rolling CDAP upgrade

We will change the coprocessors used by Tables to be wrappers that lookup the cdap version, download the relevant coprocessor jar from hdfs, instantiate the relevant class, then delegate all calls to the instantiated class. That give more detail, on startup, CDAP will load all required coprocessors to predetermined locations on hdfs:

/cdap/lib/coprocessors/table-<cdap-version>-<hbase-version>.jar

for example, the actual coprocessor implementation will be placed on hdfs at:

/cdap/lib/coprocessors/table-4.1.0-1.1.0.jar

/cdap/lib/coprocessors/table-4.1.1-1.1.0.jar

/cdap/lib/coprocessors/table-4.1.2-1.1.0.jar

The wrapper coprocessor will also be placed on hdfs, but the same jar can be used for all versions of CDAP:

/cdap/lib/coprocessors/base-1.1.0.jar

The wrapper coprocessor will be the one that each hbase table will be configured to use. When it starts up, it will read the CDAP version from a predefined table, download the required coprocessor jar, create a classloader from it, and instantiate the actual coprocessor class. This change is completely transparent to cdap users and cluster administrators. 

Rolling HBase upgrade

Rolling HBase upgrade will be considered an advanced configuration that requires additional work from the cluster administartor. We will add a configuration setting 'master.manage.coprocessors' that defaults to 'true'. When true, CDAP handles coprocessors the same as before and cluster administrators don't have to do any additional work. However, it also means there will be downtime when upgrading CDAP or HBase. When set to false, when CDAP creates HBase Table, it will only specify the wrapper coprocessor classname and priority, but not the hdfs path. Instead of placing the wrapper coprocessor jar on hdfs, the CDAP wrapper coprocessor jar must be installed on every HBase node and included in the HBase classpath. 

In order to upgrade HBase in a rolling fashion, cluster administrators must install the new CDAP wrapper coprocessor on the node to be upgraded and restart the regionserver. 

Both

Since the change to support rolling cdap upgrades is internal to cdap, the work to support both is the same as the work to support just rolling HBase upgrade.

API changes

No changes to programmatic APIs

New REST APIs

No REST API changes

CLI Impact or Changes

  • None

UI Impact or Changes

  • None

Security Impact 

What's the impact on Authorization and how does the design take care of this aspect

Impact on Infrastructure Outages 

System behavior (if applicable - document impact on downstream [ YARN, HBase etc ] component failures) and how does the design take care of these aspect

Test Scenarios

Test IDTest DescriptionExpected Results
1Run an app that uses all coprocessor features (readless increments, etc) on CDAP 3.5.2. Perform a rolling upgrade without stopping the app.Table contents are as expected
2Run an app that uses all coprocess features on CDAP 4.1.0. Perform a rolling upgrade of HBase to another supported version without stopping the app.Table contents are as expected
   
   

Releases

Release 4.1.0

Related Work

  • Work #1
  • Work #2
  • Work #3

Future work