Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Checklist

  •  User Stories Documented
  •  User Stories Reviewed
  •  Design Reviewed
  •  APIs reviewed
  •  Release priorities assigned
  •  Test cases reviewed
  •  Blog post

Introduction 

We want to remove the usage of upgrade tool, so that we can move towards the goal of zero/minimal down time.

Goals

For this specific work, the goal is to remove the upgrade of metadata states in the Upgrade Tool and rather move it to background threads started in the individual stores - DatasetBasedTimeSchedule, DatasetBasedStreamSizeSchedule, AppMetadataStore.

User Stories 

  • User A wants to upgrade from CDAP version X to Y. In this case, the user wants to experience minimal down time. Since we require that CDAP and its programs should be stopped while the upgrade tool is running, the user wants the execution of upgrade tool to take as minimal time as possible. This implies doing as minimal work as required in the upgrade tool and move the rest to the 
  • User B has manual replication setup from cluster A to cluster B. Now when B becomes passive and it is being upgraded, we can't start the tx manager and update the HBase table entries. This needs to be done while the cluster is active. Thus any of transactional data modification operation should happen after CDAP starts up and not in the upgrade tool

Design

Currently the Upgrade Tool performs two high level operations -
a) upgrade the coprocessors of CDAP Datasets
b) modify stream store (this will be removed since this step was present even in 3.5)
c) add app versions to three datasets - DatasetBasedTimeScheduleStore, DatasetBasedStreamSizeScheduleStore, AppMetadataDataset 

Step a) is performed linearly and thus this will contribute to the upgrade tool run time proportional to the number of datasets in CDAP. 
Step c) needs to be moved to their respective data stores and the upgrade tool should not be doing that operation anymore. 

Approach

Approach #1

For each of the Datasets where App version needs to be added:

Step 1) Since we can't upgrade the datasets in the upgrade tool, we need to do it after CDAP starts up. That means the dataset store should be able to work with both the old format and the new versioned-format.
Step 2) The store will check if the app version needs to be upgraded (based on a key in the table which indicates what was the last 'CDAP' version of the dataset). If it is not the latest, then the background thread is started which will update the entries in the background.
Step 3) During normal dataset operations (for example, pause schedule or delete schedule or add schedule etc), the following things must be kept in mind:

  • For Update of Record - only update the versioned entry
  • For Addition of Record - only add the versioned entry
  • For Deletion of Record - check both the versioned and non-versioned entry and delete them
  • For List of Records - scan with and without versions, add versions for version-less scan and combine both the lists and return it
  • Transactional operation should be retried if there are TransactionConflictException since we have a background thread that updates these records

Background Threads:

  • Threads are started in each Store whenever it detects that the latest CDAP version doesn't match the upgraded version of the Dataset
  • The logic to upgrade the entries in the dataset are already present in each store. The threads can leverage that logic.
  • When the thread finds an entry to update, it should check if an entry with updated version exists in the dataset. If it does exist, then it should remove the version-less entry and not replace it (since the versioned entry could have been made by the store before the upgrade thread reached that entry).
  • When all the entries have been upgraded, the thread should set the latest version of the dataset to the current version and then exit.

API changes 

None

New Programmatic APIs

None

Deprecated Programmatic APIs

New REST APIs

NA

Deprecated REST API

NA

CLI Impact or Changes

  • NA

UI Impact or Changes

  • NA

Security Impact 

None, since the upgrade operations will happen in AppFabric in background threads and that process already has the privileges to modify these datasets.

Impact on Infrastructure Outages 

Background upgrade threads will set upgraded CDAP version only after all the upgrade is complete. Until then upgrade thread will be started by the respective stores. And the upgrade threads will retry the operations in case of errors while trying to write to HBase with a specific retry strategy.

Test Scenarios

Test IDTest DescriptionExpected Results
   
   
   
   

Releases

Release 4.1.1

Related Work

Future work

  • Parallelize the coprocessor upgrade step until that step is still required