Page Comparison

...

Dataset Instance Configuration

As a

Explore Integration

...

[Note: "As a user" refers to app developers, data scientists, dev-ops, or Hydrator users, pipeline developers]

As a user, when creating a dataset instance, I want to find out what properties are supported by the dataset type, what values are allowed, and what the defaults are.
As a user, I want to specify the schema of a dataset in a uniform way across all dataset types.
As a user, I want to specify schema as a JSON string (verbose, Avro-style).
As a user, I want to specify schema as a SQL schema string (brief, Hive-style).
As a user, I want to configure time-to-live (TTL) in a uniform way across all dataset types.
As a user, I want to see the properties that were used to configure a dataset instance.
As a user, I want to find out what properties of a dataset can be updated.
As a user, I want to update the properties of a dataset instance. I expect this to fail if the new properties are not compatible, with a meaningful error message.
As a user, I want to update a single property of a dataset instance, without knowing all other properties. For example, set the TTL without having to know the schema.
As a user, I want to remove a single property of a dataset instance, without knowing all other properties. For example, remove the TTL without having to know the schema.
As a user, I want to trigger a migration process for a dataset if updating its properties requires that.
As a user, I expect that if reconfiguration of a dataset fails, then no changes have taken effect. In other words, all steps required to reconfigure a dataset must be done as a single atomic action.
As an app developer, I expect that application creation fails if any of its datasets cannot be created.
As an app developer, I expect that application redeployment fails if any of its datasets cannot be reconfigured (if the new app spec specifies different configuration).
As an app developer, when creating a dataset as part of app deployment, I want to tolerate existing datasets if their properties are different but compatible. For example, I can configure the dataset schema, but leave the existing TTL of a table untouched.

Explore Integration

As a user, I want to specify as part of dataset configuration whether it is explorable.
As a user, I do not want to specify the explore schema (and format) as separate properties if they can be derived from other standard dataset properties.
As a user, I want to specify the explore schema separately (for example, only include a subset of the fields of a table, or name fields differently).
As a user, I expect that dataset creation fails if the dataset cannot be enabled for explore.
As a user, I expect that dataset reconfiguration fails if the corresponding update of the explore table fails.
As a user, I expect that a dataset operation fails if it fails to make its required changes to explore.
As a user, I expect that an update of explore never leads to silent loss of data (or data available for explore). If, for example, partitions would be dropped from the explore table, I want to have the option to either fails the update, or to be notified of the drop and have a tool to bring explore in sync with the data.
As a user, I want to enable explore for a dataset that was not configured for explore initially.
As a user, I want to disable explore for a dataset that was configure for explore initially.

Versions Compared

Old Version 13

New Version 14

Key

Dataset Instance Configuration

Explore Integration

Explore Integration