Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Goal

In CDAP 4.0, the main theme for Datasets is improving/establishing proper and semantically sound dataset management. That includes the management of dataset types (code), and the management of dataset instances (actual data) throughout their life cycle. The current dataset framework has various shortcomings that need to be addressed. This document will discuss each area of improvement, list end-to-end use cases and requirements, and finally address the design to implement the requirements.

...

  •  User stories documented(Andreas)
  •  User stories reviewed(Nitin)
  •  User stories reviewed(Todd)
  •  Requirements documented(Andreas)
  •  Requirements Reviewed
  •  Mockups Built
  •  Design Built
  •  Design Accepted

...

  1. As an app developer, I want to include the code of a dataset type in my app artifact, and create a dataset of that type when deploying the app.
  2. As an app developer, I want to deploy a new version of a dataset type as part of deploying a new version of the app that includes it, and I expect that all dataset instances of that type that were created as part of the app deployment start using the new code. 
  3. As an app developer, I want to share a dataset type that I had previously deployed as part of an app.
  4. As an app developer, I want to deploy a new version of a dataset type as part of an app artifact, without affecting other datasets of this type.
  5. As an app developer, I want to explore a dataset instance of a type that was deployed as part of an app.
  6. As an app developer, I expect that deploying an artifact without creating an app will not create any dataset types or instances (that is, this only happens when creating an app).
  7. As an app developer, I want to share a dataset type across multiple applications that include the dataset type's code in their artifacts.
  8. As an app developer, when deploying a new version of an app that includes a shared dataset type, I expect that all dataset instances created by this app start using the new code, but all dataset instances created by other apps remain unchanged.
  9. As an app developer, I want to deploy a new version of an app that includes an older version of a dataset type deployed by another app, and I expect that the dataset instances created by this app use the dataset type code included in this app.
  10. As an app developer, when I deploy a new version of an app that includes an different version of a dataset type deployed by another app, and this app shares a dataset instance of this type with the other app, the deployment will fail with a version conflict error. (Because otherwise I might "downgrade" the instance to an older version, making it incompatible with the other app). 
    Note: This use case needs discussion. What is proper behavior? How can be prevent data corruption due to unintentional "downgrade" without restricting ease of use too much?
  11. As a dataset developer, I want to deploy a dataset type independent from any app, and allow apps to create and use dataset instances of that type.
  12. As a dataset developer, I need an archetype that helps me package my dataset type properly.
  13. As a dataset developer, I want to separate the interface from the implementation of a dataset type.
  14. As an app developer, I want to only depend on the interface of a dataset type in my app, and have the system inject the implementation at runtime. 
  15. As an app developer, I want to write unit tests for a an app that depends on the interface of a dataset type. (This means I need an extra dependency with test scope in my pom.xml)
  16. As a dataset developer, I want to assign explicit versions to the code of a dataset type.
  17. As a dataset developer, I want to deploy a new version of a dataset type without affecting the dataset instances of that type.
  18. As an app developer, I want to create a dataset instance with a specific version of a dataset type. 
  19. As a dataset developer, I want to have the option of implementing an "upgrade step" for when a dataset instance is upgraded to a new version of the dataset type.
  20. As a dataset developer, I want to have a way to reject an upgrade of a dataset instance to a newer version of it type, if the upgrade is not compatible. 
  21. As a dataset developer, I want to have the option of implementing a migration procedure that can be run after an upgrade of a dataset instance to a new version of it type. This can be a long-running (background) process.
  22. As a dataset developer, I want to implement custom administrative operations (such as "compaction", or "rebalance") that are no common to all dataset types.
  23. As an app developer, I want to perform custom administrative operations on dataset instances from my app, the CLI, REST, or the UI. 
  24. As a dataset developer, I want to explore a dataset instance created from a dataset type that was deployed by itself. 
  25. As a dataset developer, I want to delete outdated versions of a dataset type. I expect this to fail if there are any dataset instances with that version of the type. 
  26. As a dataset developer, I want to list all dataset instances that use a dataset type, or a specific version of a type.
  27. As a data scientist or app developer, I want to be able to create a dataset instance of an existing dataset type without writing code.
  28. As a data scientist or app developer, I want to be able to upgrade a dataset instance to a new version of its code.
  29. As a hydrator user, I want to create a pipeline that reads or writes an existing dataset instance.
  30. As a hydrator user, I want to create a pipeline that reads or writes a new dataset instance, and I want to create that dataset instance as part of pipeline creation. 
  31. As a hydrator user, I want to specify an explicit version of the dataset types of the dataset instances created by my pipeline, and I expect pipeline creation to fail (similar to app creation) if that results in incompatible upgrade of an existing dataset instance that is shared with other apps or pipelines.
  32. As a hydrator user, I want to explore the datasets created by my pipeline.
  33. As a hydrator user, I expect all dataset instances created by apps to be available as sinks and sources for pipelines (if there is a corresponding plugin).
  34. As an app developer, I expect all dataset instances created by Hydrator pipelines to be accessible to the app.
  35. As a plugin developer, I want to include the code for a dataset type in the plugin artifact. When a pipeline using this plugin is created, a dataset instance of that type is created, and it is explorable and available to apps.
  36. As a plugin developer, I want to use a custom dataset type (that was deployed independently or as part of an app) inside the plugin. 
  37. As a plugin developer, I want to upgrade the code of a dataset type used by a dataset instance created by that plugin, when I deploy a new version of the plugin and update the pipeline to use that version.
  38. As a pipeline developer, I want to upgrade a dataset instance to a newer version of the code after the pipeline was created. 

...