Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Dataset Instance Configuration

  1. As a 

Explore Integration

...

[Note: "As a user" refers to app developers, data scientists, dev-ops, or Hydrator users, pipeline developers]

  1. As a user, when creating a dataset instance, I want to find out what properties are supported by the dataset type, what values are allowed, and what the defaults are. 
  2. As a user, I want to specify the schema of a dataset in a uniform way across all dataset types.
  3. As a user, I want to specify schema as a JSON string (verbose, Avro-style).
  4. As a user, I want to specify schema as a SQL schema string (brief, Hive-style).
  5. As a user, I want to configure time-to-live (TTL) in a uniform way across all dataset types. 
  6. As a user, I want to see the properties that were used to configure a dataset instance.
  7. As a user, I want to find out what properties of a dataset can be updated.  
  8. As a user, I want to update the properties of a dataset instance. I expect this to fail if the new properties are not compatible, with a meaningful error message.
  9. As a user, I want to update a single property of a dataset instance, without knowing all other properties. For example, set the TTL without having to know the schema. 
  10. As a user, I want to remove a single property of a dataset instance, without knowing all other properties. For example, remove the TTL without having to know the schema. 
  11. As a user, I want to trigger a migration process for a dataset if updating its properties requires that.
  12. As a user, I expect that if reconfiguration of a dataset fails, then no changes have taken effect. In other words, all steps required to reconfigure a dataset must be done as a single atomic action.
  13. As an app developer, I expect that application creation fails if any of its datasets cannot be created.
  14. As an app developer, I expect that application redeployment fails if any of its datasets cannot be reconfigured (if the new app spec specifies different configuration). 
  15. As an app developer, when creating a dataset as part of app deployment, I want to tolerate existing datasets if their properties are different but compatible. For example, I can configure the dataset schema, but leave the existing TTL of a table untouched.

Explore Integration

  1. As a user, I want to specify as part of dataset configuration whether it is explorable.
  2. As a user, I do not want to specify the explore schema (and format) as separate properties if they can be derived from other standard dataset properties.
  3. As a user, I want to specify the explore schema separately (for example, only include a subset of the fields of a table, or name fields differently).
  4. As a user, I expect that dataset creation fails if the dataset cannot be enabled for explore.
  5. As a user, I expect that dataset reconfiguration fails if the corresponding update of the explore table fails.
  6. As a user, I expect that a dataset operation fails if it fails to make its required changes to explore.
  7. As a user, I expect that an update of explore never leads to silent loss of data (or data available for explore). If, for example, partitions would be dropped from the explore table, I want to have the option to either fails the update, or to be notified of the drop and have a tool to bring explore in sync with the data. 
  8. As a user, I want to enable explore for a dataset that was not configured for explore initially.
  9. As a user, I want to disable explore for a dataset that was configure for explore initially.