User Stories
- As a data pipeline user, I would like to be able to pull data from Salesforce in a streaming pipeline, transform it, and load it into a sink of my choice for analysis.
TBD
Requirements
TBD: Bhooshan Mogal Albert Shau please fill these in
- Pushtopic: should be optional. The plugin should attempt to create one if it is not specified. If there is an error creating it, a human readable error message should be displayed. At the end of the run, the pushtopic should be deleted, if the plugin created it. If a user overrides it, the plugin should not manage the pushTopic
- Should support the same requirements regarding Object and SOQL query as in the Salesforce Batch Source
- Login URL should be configurable
General Overview of Salesforce Streaming API
...
Code Block |
---|
{ "name": "SalesforceRealtime", "plugin": { "name": "SalesforceRealtime", "type": "realtimesource", "label": "SalesforceRealtime", "properties": { "clientId": "XXXXXXX", "clientSecret": "XXXXXXXXX", "username": "XXXXX", "password": "XXXXX", "topic": "/events/InvoiceStatementUpdates", "outputSchema": "Name:String", "loginUrl": "https://login.salesforce.com/services/oauth2/token" } } } |
Implementation
Salesforce streaming API will be used for realtime plugin.
Cometd BayeuxClient is used to subscribe to events. When event is received the data is added to a blockingQueue
.
Class SalesforceReceiver extends Receiver from Spark Streaming and is used as a receiver stream.
SalesforceReceiver#onStart:
- 1. Establish connection to
instance_url
/cometd/45.0 - 2. Start thread which moves data from
blockingQueue
to Spark Stream