Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

...

Metadata Storage and Indexing

In the current implementation of MetadataDataset, the key which is stored is a toString representation of the EntityId i.e.
EntityType.entitydetails.key For example for a dataset it looks like

datasetDatasetInstance:namespace.datasetName.metadataKey

We do this because this allows us to search for queries like dataset:*

For more information please refer to earlier design documentation of our metadata store and the implementation here:

Storage Design

MdsKey

With the proposed changed in this design Note: We store the old Id representation of the Ids and not EntityIds to keep backward compatibility with serialized keys from before. During this release when we will be upgrading the metadata store we should defenitely migrate all the keys to not use old Ids and use a serialization form which is independent of EntityIds etc so that our serialization does not break with renames/changes of EntityIds.

We do this because this allows us to search for queries like dataset:* or queries getMetadata() queries for an entity like Dataset.

For more information please refer to earlier design documentation of our metadata store and the implementation here:

Storage Design

MdsKey

With the proposed changed in this design document we will introduce a class called MetadataEntity which will be a List of key-value pairs. In a simple represetation it will look like:

...

Since files are not an EntityId in CDAP neither , CDAP itself is does not know the hireracy of the file this custom entity type, it . Hence CDAP will not be able consturct the MetadataEntity back since all the individual keys are not persisted in the above format. To solve this issue we will not the now store the MetadataEntity information with all the key-value pairs. To maintain backward compatibility and support search based on the entity type we will also be storing the information where the key is prefixed by the target entity type as earlier.  So finally the key will look something like this:

<length-encoding>file<length-encoding>namespace=nsOne<length-encoding>dataset=dsOne<length-encoding>partition=partitionOne<length-encoding>file=fileOne

Search Queries:

We will maintain support for all search queries as listed here for backward compatibility. No new search capabilites will be added.

...