Confluent Cloud Plugin

Introduction

Confluent Cloud is a streaming data service with Apache Kafka delivered as a managed service. 

User Storie(s)

  • As a pipeline developer, I would like to stream data real time from Confluent cloud based on the specified schema by the user
  • As a pipeline developer, I would like to stream data real time to Confluent Cloud and perform serialization during event streaming
  • As a pipeline developer I would like to capture records that did not get delivered downstream to Confluent Cloud for analysis 

Plugin Type

  • Batch Source
  • Batch Sink 
  • Real-time Source
  • Real-time Sink
  • Action
  • Post-Run Action
  • Aggregate
  • Join
  • Spark Model
  • Spark Compute

Prerequisites

  • Kafka Broker: Confluent Platform 3.3.0 or above, or Kafka 0.11.0 or above
  • Connect: Confluent Platform 3.3.0 or above, or Kafka 0.11.0 or above
  • Java 1.8

Properties

Real time source

PropertydescriptionMandatory or Optional
referenceNameUniquely identify the sourceYes
Kafka cluster credential API keyAPI key in order to connect to the Confluent clusterYes
Kafka cluster credential secretSecret in order to connect to the Confluent clusterYes
Kafka ZookeeperThe connect string location of ZooKeeper. Either that or the list of brokers is requiredRequired if brokers not specified
Kafka brokersComma-separated list of Kafka brokers. Either that or the ZooKeeper quorum is requiredRequired if zookeeper not specified
Kafka partitionNumber of partitionsYes
Kafka offsetThe initial offset for the partition
Kafka topicList of topics which we are listening to for streamingYes
Schema registry URLURL endpoint for the schema registry on Confluent Cloud or self hosted schema registry URLNo
Schema registry API keyAPI keyNo
Schema registry secretSecretNo
FormatSpecify the format for the Kafka event. Any supported format by CDAP is supported. Default output is key and Value as bytesNo


Real time sink

PropertydescriptionTypeMandatory
Reference NameUniquely identify the sinkStringYes
Kafka cluster credential API keyAPI key in order to connect to the Confluent clusterString

Yes

Kafka cluster credential secretSecret in order to connect to the Confluent clusterStringYes
Kafka brokersComma-separated list of Kafka brokersStringYes
AsyncSpecifies whether writing the events to broker is Asynchronous or SynchronousSelectYes
partitionfieldSpecifies the input fields that need to be used to determine the partition idInt or LongYes
keySpecifies the input field that should be used as the key for the event published into Kafka.String

Yes

Kafka topicList of topics to which the data should be published toStringYes
formatSpecifies the format of the event published to Confluent cloudStringYes


References

https://docs.confluent.io/current/connect/kafka-connect-bigquery/index.html

https://docs.confluent.io/current/cloud/connectors/cc-gcs-sink-connector.html#cc-gcs-connect-sink

https://docs.confluent.io/current/quickstart/cloud-quickstart/index.html

https://docs.cask.co/cdap/4.2.0/en/developer-manual/pipelines/plugins/sinks/kafkaproducer-realtimesink.html

https://docs.cask.co/cdap/4.2.0/en/developer-manual/pipelines/plugins/sources/kafka-realtimesource.html

https://docs.confluent.io/current/cloud/limits.html#cloud-limits



Design / Implementation Tips

Design

Approach(s)

Properties

Security

Limitation(s)

Future Work

  • Some future work – HYDRATOR-99999
  • Another future work – HYDRATOR-99999

Test Case(s)

  • Test case #1
  • Test case #2

Sample Pipeline

Please attach one or more sample pipeline(s) and associated data. 

Pipeline #1

Pipeline #2



Table of Contents

Checklist

  • User stories documented 
  • User stories reviewed 
  • Design documented 
  • Design reviewed 
  • Feature merged 
  • Examples and guides 
  • Integration tests 
  • Documentation for feature 
  • Short video demonstrating the feature