Required Search Fields
Checklist
- User Stories Documented
- User Stories Reviewed
- Design Reviewed
- APIs reviewed
- Release priorities assigned
- Test cases reviewed
- Blog post
Introduction
CDAP 6.0.0 metadata search allows users to search for multiple tags at a time, with the results being entities that have at least one of the requested tags.Currently CDAP runs search using ElasticSearch and an internally built noSQL search system, and this feature is designed to be implemented in both.
The new feature presented in this design doc aims to introduce a search syntax that allows the user to indicate tags that required objects are required to have.
Currently CDAP runs search using Elasticsearch and an internally built NoSQL search system, and this feature is designed to be implemented in both.
Goals
The new feature presented in this design document aims to introduce a search syntax that allows the user to indicate required tags in the output results. Currently CDAP runs search using ElasticSearch and an internally built noSQL search system, and this feature is designed to be implemented in both.
User Stories
As a pipeline developer, I want to search for all datasets that contain both tag X and tag Y.
Design
This design will introduce a new internal API for handling special queries, those containing required terms, for use by both the NoSQL and Elasticsearch implementations of metadata storage.
A high level overview of how the two systems will talk to the new API to access the parsed data:
Approach
Approach #1
Create a helper class to parse the user’s request. The information extracted will include the individual terms and whether they are required or optional in the results based on their syntax. Abstracting this functionality and information will allow for both Elasticsearch and NoSQL to utilize methods from the same class. Future expansions on this system will be possible through one file.
Approach #2
Continue to parse user requests in each implementation separately, looking through the query string for the required term notation. While similar between implementations, the process of extracting and storing information regarding priority level would be left to the individual implementations to handle. This approach would be more straightforward to achieve, but ultimately harder to maintain and augment in the future.
Primary design considerations
Scalability—what if we want to add new features for search later on?
Complexity—how can we implement the feature in a way that is conceptually straightforward and effective?
Implementation
Create a QueryTerm
class containing two fields:
String term;
Qualifier qualifier;
Create a QueryParser
class for splitting queries into organized QueryTerm
objects.
Uses a public parse()
method that takes a query string as a parameter, separates that string into individual terms by whitespace, parses each one individually for search operators, and returns them as QueryTerms
in a list.
Elasticsearch Implementation
In ElasticsearchMetadataStorage.java
’s createMainQuery()
method, delegate query parsing and string formatting to the new QueryParser
class, and use the resultant QueryTerm
object information to make proper calls to Elasticsearch’s API.
NoSQL Implementation
Keep the original search method the same but utilize the QueryParser
class to retrieve search terms that are stripped of the new syntax.
Create a class, MetadataResultEntry
, which contains an instance of MetadataEntry
. This new object will be replacing the current corresponding instances of MetadataEntry
. The object will also contain a string representing the term that was used to search for the MetadataEntry
.
Alter the existing SearchTerm
class such that it contains a QueryTerm
field. It will use this field to construct MetadataResultEnrty
’s.
Before sorting the results, filter out the entities that do not have the tags specified as required by the user’s search query.
API changes
New Programmatic APIs
UI Impact or Changes
User can now indicate a required term in the metadata search bar by simply placing the symbol “+” directly in front of the desired term
e.g.: query: “+tag1 tag2”
tag1 is a ‘required’ term
tag2 is an ‘optional’ term
Future work
New search syntax symbols can be more easily added to the current implementation
Possible examples of such syntax: “!tag1” which indicates: Do not include results that have ‘tag1’ as a metadata tag