Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

A dataset instance is configured by passing a set of properties (that is, string-to-string pairs) to the configure() method  method of the dataset type. However:

  • Common properties such as schema are not standardized across dataset types
  • There is no way (other than reading documentation) to find out what which properties a dataset type accepts. For a wizard-driven UI we would need a programmatic API to list all configconfigurations. For plugins and apps, we have a very good way to include that in the implementation of the plugin. Datasets should have something similar.
  • Reconfiguration of a dataset can be problematic. Sometimes the change of a property is not compatible with existing data in a dataset (for example, changing the schema). There is no easy way to find out what which properties can be changed. 
  • Also, a reconfiguration may require a data migration or other long-running process to implement the change. The current dataset framework has no APIs to implement that. 

...

The dataset framework defines five administrative APIs: create(), exists(), drop(), truncate(), and update()}  (and a sixth, upgrade(), which is broken). However, many dataset types have specific administrative procedures that are not common across types. For example, an HBase table may require compaction, which is not supported by other dataset types. We need a way to implement such actions as part of the dataset administration interface. 

...