Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Introduction

Cloud Vision plugins will allow users to use pre-trained Vision API models to detect emotion, understand text, and more. They will be useful in enriching data with additional attributes such as labels, faces, etc.

NOTE: These plugins will incur additional cost of the Cloud Vision APIs

Use case(s)

  • As a user, I want to various features in my images and documents using the Cloud Vision API, so that I can add ML-driven enrichments to my Data Fusion pipelines that process unstructured data
  • As a user, I want easy, UI-driven ways of manipulating and understanding the output of the Cloud Vision API, so that I do not need to write any code for parsing it.

User Storie(s)

Plugin Type

  •  Batch Source
  •  Batch Sink 
  •  Real-time Source
  •  Real-time Sink
  •  Transform
  •  Action
  •  Post-Run Action
  •  Aggregate
  •  Join
  •  Spark Model
  •  Spark Compute

Configurables 

File Path Batch Source

This source will read a directory, and instead of emitting records from files in the directory, it will emit all the file names as records. It should work for object stores as well.

SectionUser Facing NameTypeDescription
ConstraintsBasicAccess TokenFeaturescheckboxesThe features to extract from documents. Select from
Image Extractor transform
OptionalDefault
BasicPathtextboxThe path to the directory where the files whose paths are to be emitted are locatedNo

RecursivetoggleWhether the plugin should recursively traverse the directory for subdirectoriesYesFalse

Last Modified Afterdate-time pickerA way to filter files to be returned based on their last modified timestampYes1/1/1970 (epoch)
AdvancedSplit byradio buttonDetermines splitting mechanisms. Choose amongst default (uses the default splitting mechanism of file input format), batch size (by number of files in a batch), directory (by each sub directory)Yesdefault

Batch sizenumberSpecifies the number of files to process in a single batch. Only required when Split By is set to batch size.

Image Extractor transform

The image extractor transform can be used in conjunction with the file path batch source to extract enrichments from each image based on selected features.

It should send all errors to the error port.

SectionUser Facing NameTypeDescriptionOptionalConstraintsDefault
BasicAccess TokenPasswordAuthentication token for Cloud Vision API.No

Field containing pathTextField in the input schema containing the path to the image. Defaults to 'path'YesPath

FeaturescheckboxesThe features to extract from documents. Select from Text, Handwriting, Crop Hints, Faces, Image properties, Labels, Landmarks, Logos, Multiple Objects, Explicit ContentNo
AdvancedLanguage Hintsmulti-selectOptional hints to provide to Cloud Vision API in case it has trouble detecting the language of the text in the images. Only shown when the Text feature is selected. Select from supported languagesYesNone/Empty

Aspect Ratiosmulti-selectAspect ratios as a decimal number, representing the ratio of the width to the height of the image. For example, if the desired aspect ratio is 4/3, the corresponding float value should be 1.33333. Only shown when Crop Hints is selected as a feature. If not specified, the best possible crop is returned. The number of provided aspect ratios is limited to a maximum of 16; any aspect ratios provided after the 16th are ignored.YesNone

Include Geo ResultstoggleWhether to include results derived from the geo information in the image. Only shown when Web Detection is selected as a featureYesfalse






File Extractor transform

SectionUser Facing NameTypeDescriptionConstraints
BasicAccess Token



FeaturescheckboxesThe features to extract from documents

























Design / Implementation Tips

  • Tip #1
  • Tip #2

Design

Approach(s)

Properties

Security

Limitation(s)

Future Work

  • Some future work – HYDRATOR-99999
  • Another future work – HYDRATOR-99999

Test Case(s)

  • Test case #1
  • Test case #2

Sample Pipeline

Please attach one or more sample pipeline(s) and associated data. 

Pipeline #1

Pipeline #2



Table of Contents

Table of Contents
stylecircle

Checklist

  •  User stories documented 
  •  User stories reviewed 
  •  Design documented 
  •  Design reviewed 
  •  Feature merged 
  •  Examples and guides 
  •  Integration tests 
  •  Documentation for feature 
  •  Short video demonstrating the feature