Checklist
- User Stories Documented
- User Stories Reviewed
- Design Reviewed
- APIs reviewed
- Release priorities assigned
- Test cases reviewed
- Blog post
Introduction
Simplify and improve user experience for cloud bases plugins such as GCS, S3, BigQuery for CDAP data-pipelines in cloud environment.
Goals
When CDAP is provisioned in cloud environments such as Google cloud or AWS improve plugins GCS, BigQuery or S3 in data-pipelines, provide autofill selection for fields such as bucket_name, dataset_id, etc.
When CDAP is provisioned in cloud environments such as Google cloud or AWS improve plugins GCS, BigQuery or S3 in data-pipelines, disable fields such as credentials, project_information, etc which are available by default for the CDAP instance.
User Stories
- User provisioned CDAP on GCP and while configuring Google Cloud Storage plugin in data-pipeline, expects auto suggestions for bucket_name, path fields based on the buckets they have access to.
User provisioned CDAP on GCP and while configuring Google BigQuery plugin in data-pipeline, expects auto suggestions for dataset_id, table_id fields based on the datasets they have access to.
- As a plugin developer, for Google cloud plugins I like to specify certain fields as disabled for GCP environment to improve user experience.
As a plugin developer, for AWS plugins I like to specify certain fields as disabled for AWS environment to improve user experience.
User provisioned CDAP on GCP and while configuring GCS/BigQuery plugin in data-pipeline, doesn’t expect to provide credentials and project information, they rather expect these fields to be hidden/disabled.
Design
Autofill suggestions for cloud plugins
When users use data-prep to wrangle their data and then create a pipeline from data-prep, the source plugin and their fields are filled in automatically and the user doesn’t have to provide them again.
However a user can start using the GCS/BigQuery/S3 plugins in data-pipeline view and we would like to improve their experience in cloud by providing autofill suggestions with available values for fields such as buckets, dataset_id, etc
CDAP Plugins has endpoints capability through which they expose additional functionality, typically this is used for getting schema information, eg : Database plugin executes a query and returns the schema, this schema is used as output schema for the database source by UI.
We can leverage the existing plugins endpoints feature to add additional endpoints in the cloud plugins to list buckets, list path or list datasets etc.
@Path("bucketList") public List<String> getBucketList(String project) { List<String> buckets = new ArrayList<>(); Storage storage = StorageOptions.newBuilder() .setProjectId(project) .build() .getService(); Page<Bucket> list = storage.list(); Iterator<Bucket> iterator = list.getValues().iterator(); while (iterator.hasNext()) { buckets.add(iterator.next().getName()); } return buckets; }
Note:
The plugin endpoint backend, expects the endpoint method to have one parameter or endpoint method with two parameters, where the 2nd parameter is EndpointPluginContext.
However to support autofill for requests such as project information, where there is no input fields/ input parameter, it might need a platform change to support.
Disabling unnecessary fields in cloud plugins
When CDAP is running in cloud environment, it will be confusing or misleading to allow users to edit certain fields such as credentials, project_information etc. It will be robust if they are disabled.
Approach #1 - Annotating fields in plugin config with disabled (Considered)
We want to allow plugin developers to tag certain fields as disabled in certain runtime environments.
public static class GCSSourceConfig extends FileSourceConfig { @Name("project") @Description("Project ID") @Macro @Nullable @Disabled("GCS") public String project; @Name("serviceFilePath") @Description("Service account file path.") @Macro @Nullable @Disabled("GCS") public String serviceAccountFilePath; }
This will involve platform change, to add a new field for disabled environment list in PluginPropertyField class
public class PluginPropertyField { // existing fields private final String name; private final String description; private final String type; private final boolean required; private final boolean macroSupported; private final boolean macroEscapingEnabled; // New field private final List<String> disabled; }
When CDAP-UI receives the plugin config, if the disabled list isn’t empty and if the current CDAP environment is present in the list, then UI can disable that corresponding field.
Approach #2 - Providing disabled information through widget properties (Preferred)
Add new widget property in UI for marking a field disabled, that can be used by plugin developers when they are writing their widget json.
Example : In GCSFile-batchsource.json widget, credential field will have an additional property for disabling in GCP.
The widget properties are stored as plugin properties in artifact store, UI queries information about widget properties which has additional information about fields and their widget type.
If credential field has a property for disabled in GCP, in the widget properties and if the environment is GCP, UI can use those information to disable that field.
{ "label" : "Service Account and Project", "properties" : [ { "widget-type": "textbox", "label": "Service Account File Path", "name": "serviceFilePath", "widget-attributes" : { "placeholder": "Path to service account file (Local to host running on).", "disabled": "GCP" } }, { "widget-type": "textbox", "label": "Project Id", "name": "project", "widget-attributes" : { "placeholder": "The Project Id of GCS.", "disabled": "GCP" } }, { "widget-type": "textbox", "label": "Bucket Name", "name": "bucket", "widget-attributes" : { "placeholder": "Temporary Google Cloud Storage bucket name." } } ] }
Why Approach#2 is Preferred ?
Marking a field as disabled is very UI specific feature, unlike other annotations such as @Nullable or @Macro used in plugin config, for plugin property fields. As Nullable and Macros have backend logic, disabled is very specific to user experience. Hence Approach#2 seems a more reasonable solution.
Note :
For either of the approaches, we need a way for CDAP UI to get information about the platform the CDAP instance is on.
UI Impact or Changes
Autofill suggestions for cloud plugin fields will require a new UI widget type, as the current widget for plugin endpoints is with an explicit button to get schema, this new widget type would be an implicit endpoint call for those fields similar to stream/dataset selector.
- New UI widget attribute for disabled field, that can be understood by the UI and disable those fields