Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Scenario 3. A Dataset is Maintained by a Single Organization and Shared with Many Applications

For example, a CustomerDirectory dataset is maintained by organization X in an enterprise. C provides this dataset in namespace C, for applications in other namespaces .  

 
  • This dataset has a custom API: CustomerInfo getCustomer(String id)}. Applications that use this dataset need to include a dependency customer-api-1.0 in their pom in order to compile and package. This dataset type must implement the CustomerDirectory interface, say using a class TableBasedCustomerDirectory in artifact customer-table-1.3.1. At runtime, when the app calls getDataset(), CDAP determines that the dataset instance has that type and version, and loads the class from that artifact. 
  • The actual dataset type has more methods in its API, including one that allows adding new customers. Therefore, the app that maintains this dataset, includes the implementing artifact in its pom file. 
  • The implementation can be updated without changing the API. In this case, C deploys a new artifact customer-table-1.3.2 and upgrades the dataset to this version. The maintaining app may or may not be upgraded to the new artifact version, depending on how it bundles it: If it uses provided scope, then it automatically picks up the new jar upon restart. If it uses included scope, then it must be updated to the new version and redeployed. 
  • The implementation can be updated with an interface change, for example, adding a new field to the CustomerInfo. To make this update seamless, a new artifact customer-table-1.4.0 is deployed, and both the dataset and the maintaining app are upgraded to this version. Then a new version of the API, customer-api-1.1, is deployed, and apps may now upgrade to this version. If they don’t, then they will not see the new field, but that is fine for existing apps because their code does not use this field. Note that this requires that CustomerInfo is an interface (consisting mainly of getters) that has an implementation in the customer-table artifact. 
  • Questions:
    • what is the deployment mechanism for the two artifacts (customer-api and customer-table)?
    • how does CDAP know that customer-table implements customer-api? Does it have to know?
    • how can C migrate the dataset to a new data format without having control over the apps that consume it? Even after upgrading the dataset to a new version, C does not know when all apps have picked that up, because they may have long-running programs such as a flow or service that need to be restarted for  picking up the new version.

 

Scenario 1 can still be kept very simple by using implicit versioning (for example, using the artifact's version as the dataset type version). 

...