Confluent Cloud Plugin
- Vasudha Gupta
- Nitin Motgi
Owned by Vasudha Gupta
Introduction
Confluent Cloud is a streaming data service with Apache Kafka delivered as a managed service.
User Storie(s)
- As a pipeline developer, I would like to stream data real time from Confluent cloud based on the specified schema by the user
- As a pipeline developer, I would like to stream data real time to Confluent Cloud and perform serialization during event streaming
- As a pipeline developer I would like to capture records that did not get delivered downstream to Confluent Cloud for analysis
Plugin Type
- Batch Source
- Batch Sink
- Real-time Source
- Real-time Sink
- Action
- Post-Run Action
- Aggregate
- Join
- Spark Model
- Spark Compute
Prerequisites
- Kafka Broker: Confluent Platform 3.3.0 or above, or Kafka 0.11.0 or above
- Connect: Confluent Platform 3.3.0 or above, or Kafka 0.11.0 or above
- Java 1.8
Properties
Real time source
Property | description | Mandatory or Optional |
---|---|---|
referenceName | Uniquely identify the source | Yes |
Kafka cluster credential API key | API key in order to connect to the Confluent cluster | Yes |
Kafka cluster credential secret | Secret in order to connect to the Confluent cluster | Yes |
Kafka Zookeeper | The connect string location of ZooKeeper. Either that or the list of brokers is required | Required if brokers not specified |
Kafka brokers | Comma-separated list of Kafka brokers. Either that or the ZooKeeper quorum is required | Required if zookeeper not specified |
Kafka partition | Number of partitions | Yes |
Kafka offset | The initial offset for the partition | |
Kafka topic | List of topics which we are listening to for streaming | Yes |
Schema registry URL | URL endpoint for the schema registry on Confluent Cloud or self hosted schema registry URL | No |
Schema registry API key | API key | No |
Schema registry secret | Secret | No |
Format | Specify the format for the Kafka event. Any supported format by CDAP is supported. Default output is key and Value as bytes | No |
Real time sink
Property | description | Type | Mandatory |
---|---|---|---|
Reference Name | Uniquely identify the sink | String | Yes |
Kafka cluster credential API key | API key in order to connect to the Confluent cluster | String | Yes |
Kafka cluster credential secret | Secret in order to connect to the Confluent cluster | String | Yes |
Kafka brokers | Comma-separated list of Kafka brokers | String | Yes |
Async | Specifies whether writing the events to broker is Asynchronous or Synchronous | Select | Yes |
partitionfield | Specifies the input fields that need to be used to determine the partition id | Int or Long | Yes |
key | Specifies the input field that should be used as the key for the event published into Kafka. | String | Yes |
Kafka topic | List of topics to which the data should be published to | String | Yes |
format | Specifies the format of the event published to Confluent cloud | String | Yes |
References
https://docs.confluent.io/current/connect/kafka-connect-bigquery/index.html
https://docs.confluent.io/current/cloud/connectors/cc-gcs-sink-connector.html#cc-gcs-connect-sink
https://docs.confluent.io/current/quickstart/cloud-quickstart/index.html
https://docs.confluent.io/current/cloud/limits.html#cloud-limits
Design / Implementation Tips
Design
Approach(s)
Properties
Security
Limitation(s)
Future Work
- Some future work – HYDRATOR-99999
- Another future work – HYDRATOR-99999
Test Case(s)
- Test case #1
- Test case #2
Sample Pipeline
Please attach one or more sample pipeline(s) and associated data.
Pipeline #1
Pipeline #2
Table of Contents
Checklist
- User stories documented
- User stories reviewed
- Design documented
- Design reviewed
- Feature merged
- Examples and guides
- Integration tests
- Documentation for feature
- Short video demonstrating the feature