Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

Introduction

Google Sheets plugins will allow users to import data from Google Sheets into their pipeline, so that they can transform and enrich with other data sources.

User Storie(s)

  • As a pipeline developer, I want to import data from Google Sheets, so that I can transform and enrich it using CDAP
  • As a pipeline developer, I want to move all sheets from a given Google drive directory to a destination
  • As a pipeline developer, I want to be able to pick certain sheets from a particular sheet to process using CDAP, so that I do not have to process all sheets all the time. I want to be able to specify the sheet using the sheet name or number.
  • As a pipeline developer, I want to treat the first row of a sheet as a header, so that CDAP can automatically treat it as schema
  • As a pipeline developer, I want to be able to specify a section at the top of my Sheet as a header, so that it is extracted as metadata, and not as actual data
  • As a pipeline developer, I want to be able to specify a section at the bottom of my sheet as a footer, so that it is extracted as metadata, and not as actual data

Plugin Type

  • Batch Source
  • Batch Sink 
  • Real-time Source
  • Real-time Sink
  • Action
  • Post-Run Action
  • Aggregate
  • Join
  • Spark Model
  • Spark Compute

Configurables

Source

This section defines properties that are configurable for this plugin. 

User Facing NameTypeDescriptionOptionalConstraints
App Idstring
No
Access tokenstring
No
Directory IdstringDirectory ID is the last part of the URL, such as https://drive.google.com/drive/folders/0B2kqcwp2ycGZanhSR3JmREw5VTVNo
FilterStringA filter that can be applied to the files in the selected directory. Filters follow the Google Drive Filter SyntaxYes
Modification date rangeStringIn addition to the filter specified above, also filter files to only pull those that were modified between the date range. Defaults to last year.Yes
Sheets to pullmulti-selectSelect from a list of sheets to pull. Defaults to all.Yes
Header selectionRadio buttonsChoose between No Headers, Treat first row as header, Custom header. Defaults to No HeadersYes
Custom header first rowNumberOnly shown when the header selection is set to Custom header. Accepts the row number of the first row to be treated as a header. Defaults to 0.Yes
Custom header last rowNumberOnly shown when the header selection is set to Custom header. Accepts the row number of the last row to be treated as a header. Defaults to 0.Yes
Footer selectionRadio buttonsChoose between No Footer, Custom footer. Defaults to No FooterYes
Custom footer first rowNumberOnly shown when the footer selection is set to Custom footer. Accepts the row number of the first row to be treated as a footer.Yes
Custom footer last rowNumberOnly shown when the footer selection is set to Custom footer. Accepts the row number of the last row to be treated as a header.Yes

Note: The data in the specified header and footer rows should not be available as records to the rest of the pipeline. It should be stored as metadata.

Sink

User Facing NameTypeDescriptionOptionalConstraints
App Idstring
No
Access tokenstring
No
Directory IdstringDirectory ID is the last part of the URL, such as https://drive.google.com/drive/folders/0B2kqcwp2ycGZanhSR3JmREw5VTVNo
Sheet namestringName of the sheet. Defaults to Sheet 1Yes
Write first row as headersToggleIf true, the schema is written as the first row of the sheet. Defaults to True.Yes
Format for nested dataselectChoose amongst JSON, CSV. Format to serialize complex (nested) data as. Defaults to JSON.Yes

Note: Incoming records should be written to columns in the sheet

Design / Implementation Tips

  • Tip #1
  • Tip #2

Design

Approach(s)

Properties

Security

Limitation(s)

Future Work

  • Some future work – HYDRATOR-99999
  • Another future work – HYDRATOR-99999

Test Case(s)

  • Test case #1
  • Test case #2

Sample Pipeline

Please attach one or more sample pipeline(s) and associated data. 

Pipeline #1

Pipeline #2



Table of Contents

Checklist

  • User stories documented 
  • User stories reviewed 
  • Design documented 
  • Design reviewed 
  • Feature merged 
  • Examples and guides 
  • Integration tests 
  • Documentation for feature 
  • Short video demonstrating the feature
  • No labels