Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Introduction

Google drive plugins will help users move entire files from source to destination. Along the way, users can potentially run transformations on unstructured data such as images, audio and video as well.

User Storie(s)

  • As a pipeline developer, I want to move all files from a Google drive directory to a different destination
  • As a pipeline developer, I want to move all files from a Google drive directory that satisfy a filter to a different destination
  • As a pipeline developer, I want to pull all images from a Google drive directory, so that I can process them using image recognition APIs
  • As a pipeline developer, I want to pull all audio and video files from a Google drive directory, so that I can process them to extract metadata and/or generate transcripts, or apply other enrichments.
  • As a pipeline developer, I want to move all files from an FTP source into Google drive.

Plugin Type

  •  Batch Source
  •  Batch Sink 
  •  Real-time Source
  •  Real-time Sink
  •  Action
  •  Post-Run Action
  •  Aggregate
  •  Join
  •  Spark Model
  •  Spark Compute

Configurables

This section defines properties that are configurable for this plugin. 

Source

Option levelUser Facing NameTypeDescriptionOptionalConstraintsDefault value
Basic
App IdStringOauth2 app idNoAccess TokenStringOAuth2 access tokenNo
Directory identifierString
ID is the last part of the URL, such as https://drive.google.com/drive/folders/0B2kqcwp2ycGZanhSR3JmREw5VTV

Identifier of the source folder.

no

FilterStringA filter that can be applied to the files in the selected directory. Filters follow the Google Drive Filter SyntaxYes

Modification date rangeSelectIn addition to the filter specified above, also filter files to only pull those that were modified between the date rangeYes
select
Start DatetextboxOnly shown when the "Modification date range" is set to "Custom" value. Accepts start date for modification date range. RFC3339 format, default timezone is UTC, e.g., 2012-06-04T12:00:00-08:00.No

End datetextboxOnly shown when the "Modification date range" is set to "Custom" value. Accepts end date for modification date range.RFC3339 format, default timezone is UTC, e.g., 2012-06-04T12:00:00-08:00.No

File propertiesMulti-selectProperties which should be get for each file in directory. Allowed names can be get from Google Drive API: FilesYes

File types to pullMulti-selectTypes of files should be pulled from specified directory.Yes
binary
AuthenticationClient IDStringOAuth2 client id.No

Client secretString

OAuth2 client secret.

No

Refresh tokenStringOAuth2 refresh token.No

Access tokenStringOAuth2 access token.No

Advanced

Maximum partition size

Number

Maximum partition size specified in bytes. Default 0 value means unlimited.

Yes
0
ExportingGoogle Documents export formatSelectMIME type for Google Documents. Allowed values from Downloading Google Documents.Yes
text/plain
Google Spreadsheets export formatSelect
MIME type for Google Spreadsheets.Yes
text/csv
Google Drawings export formatSelect
MIME type for Google Drawings.Yes
image/svg+xml
Google Presentations export formatSelect
MIME type for Google Presentations.Yes
text/plain

Sink

Access TokenStringOAuth2 access token
Option levelUser Facing NameTypeDescriptionOptionalConstraints
App IdStringOauth2 app idNono
Basic


File name field

StringName of the schema field (should be STRING type) which will be used as name of file. Is optional. In the case it is not set files have randomly generated 16-symbols names.
Yes

File body field

StringName of the schema field (should be BYTES type) which will be used as body of file. The minimal input schema should contain only this field.
No
Directory identifierString

ID is the last part of the URL, such as https://drive.google.com/drive/folders/0B2kqcwp2ycGZanhSR3JmREw5VTV

Identifier of the destination folder.No
Authentication

Client IDStringOAuth2 client id.No
Client secretString

OAuth2 client secret.

No
Refresh tokenStringOAuth2 refresh token.No
Access tokenStringOAuth2 access token.No

Design / Implementation Tips

  • Tip #1
  • Tip #2

Design

Approach(s)

Properties

Security

Limitation(s)

Future Work

  • Some future work – HYDRATOR-99999
  • Another future work – HYDRATOR-99999

Test Case(s)

  • Test case #1
  • Test case #2

Sample Pipeline

Please attach one or more sample pipeline(s) and associated data. 

Pipeline #1

Pipeline #2



Table of Contents

Table of Contents
stylecircle

Checklist

  •  User stories documented 
  •  User stories reviewed 
  •  Design documented 
  •  Design reviewed 
  •  Feature merged 
  •  Examples and guides 
  •  Integration tests 
  •  Documentation for feature 
  •  Short video demonstrating the feature