Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Introduction

CDAP offers change data capture via three different approaches

  • Golden gate for Oracle
  • Log miner for Oracle
  • Change tracking for SQL server

All these CDC mechanisms are supported via Realtime data pipelines and the plugins are available from Hub. The CDC solution currently runs on Spark 1.x and has experimental support for BigTable. 

Use case(s)

The scope of work involves making the CDC solution work with Spark 2.x and being able to write to BigTable. The performance numbers throughput and latency should be published for these two destinations with all the three CDC approaches.

  • ETL developers should be able to set up realtime pipelines to write data to BigTable/BigQuery
  • Users should get field level lineage for the source and sink that is being used
  • Reference documentation should be updated to account for the changes 
  • The solution should run with all versions of Spark 2.x
  • Integration tests for specific plugins should be added in the test repos
  • Reference document should be updated for the CDC plugins

Deliverables 

  • Source code in cask-solution/cdc repo
  • Performance tests for the three approaches with BigTable 
  • Integration test code 
  • Relevant documentation in the source repo and reference documentation section in plugin

Relevant links 

Plugin Type

  •  Batch Source
  •  Batch Sink 
  •  Real-time Source
  •  Real-time Sink
  •  Action
  •  Post-Run Action
  •  Aggregate
  •  Join
  •  Spark Model
  •  Spark Compute

Configurables

This section defines properties that are configurable for this plugin. 

User Facing NameTypeDescriptionConstraints








Design / Implementation Tips

  • Tip #1
  • Tip #2

Design

Approach(s)

Properties

Security

Limitation(s)

Future Work

  • Some future work – HYDRATOR-99999
  • Another future work – HYDRATOR-99999

Test Case(s)

  • Test case #1
  • Test case #2

Sample Pipeline

Please attach one or more sample pipeline(s) and associated data. 

Pipeline #1

Pipeline #2



Table of Contents

Table of Contents
stylecircle

Checklist

  •  User stories documented 
  •  User stories reviewed 
  •  Design documented 
  •  Design reviewed 
  •  Feature merged 
  •  Examples and guides 
  •  Integration tests 
  •  Documentation for feature 
  •  Short video demonstrating the feature