...
Dataset Instance Configuration
- As a
Explore Integration
...
[Note: "As a user" refers to app developers, data scientists, dev-ops, or Hydrator users, pipeline developers]
- As a user, when creating a dataset instance, I want to find out what properties are supported by the dataset type, what values are allowed, and what the defaults are.
- As a user, I want to specify the schema of a dataset in a uniform way across all dataset types.
- As a user, I want to specify schema as a JSON string (verbose, Avro-style).
- As a user, I want to specify schema as a SQL schema string (brief, Hive-style).
- As a user, I want to configure time-to-live (TTL) in a uniform way across all dataset types.
- As a user, I want to see the properties that were used to configure a dataset instance.
- As a user, I want to find out what properties of a dataset can be updated.
- As a user, I want to update the properties of a dataset instance. I expect this to fail if the new properties are not compatible, with a meaningful error message.
- As a user, I want to update a single property of a dataset instance, without knowing all other properties. For example, set the TTL without having to know the schema.
- As a user, I want to remove a single property of a dataset instance, without knowing all other properties. For example, remove the TTL without having to know the schema.
- As a user, I want to trigger a migration process for a dataset if updating its properties requires that.
- As a user, I expect that if reconfiguration of a dataset fails, then no changes have taken effect. In other words, all steps required to reconfigure a dataset must be done as a single atomic action.
- As an app developer, I expect that application creation fails if any of its datasets cannot be created.
- As an app developer, I expect that application redeployment fails if any of its datasets cannot be reconfigured (if the new app spec specifies different configuration).
- As an app developer, when creating a dataset as part of app deployment, I want to tolerate existing datasets if their properties are different but compatible. For example, I can configure the dataset schema, but leave the existing TTL of a table untouched.
Explore Integration
- As a user, I want to specify as part of dataset configuration whether it is explorable.
- As a user, I do not want to specify the explore schema (and format) as separate properties if they can be derived from other standard dataset properties.
- As a user, I want to specify the explore schema separately (for example, only include a subset of the fields of a table, or name fields differently).
- As a user, I expect that dataset creation fails if the dataset cannot be enabled for explore.
- As a user, I expect that dataset reconfiguration fails if the corresponding update of the explore table fails.
- As a user, I expect that a dataset operation fails if it fails to make its required changes to explore.
- As a user, I expect that an update of explore never leads to silent loss of data (or data available for explore). If, for example, partitions would be dropped from the explore table, I want to have the option to either fails the update, or to be notified of the drop and have a tool to bring explore in sync with the data.
- As a user, I want to enable explore for a dataset that was not configured for explore initially.
- As a user, I want to disable explore for a dataset that was configure for explore initially.