User Stories
- As a data pipeline user, I would like to be able to pull data from Salesforce in a streaming pipeline, transform it, and load it into a sink of my choice for analysis.
TBD
...
- Account
- Campaign
- Case
- Contact
- Lead
- Opportunity
- Task
- All custom objects
Salesforce pushTopic queries are constricted to below format:
SELECT <comma-separated list of fields> FROM <Salesforce object> WHERE <filter criteria>
Also:
- The SELECT statement’s field list must include Id.
- Only one object per query is allowed.
- The object must be valid for the specified API version.
Number of pushTopics is limited to 40-100 depending on Salesforce license type.
...
Code Block |
---|
{ "name": "SalesforceRealtime", "plugin": { "name": "SalesforceRealtime", "type": "realtimesource", "label": "SalesforceRealtime", "properties": { "clientId": "XXXXXXX", "clientSecret": "XXXXXXXXX", "username": "XXXXX", "password": "XXXXX", "topic": "/events/InvoiceStatementUpdates", "outputSchema": "Name:String", "loginUrl": "https://login.salesforce.com/services/oauth2/token", "pushTopicName": "", "pushTopicQuery": "", "PushTopicNofityCreate": "", "PushTopicNotifyUpdate": "", "PushTopicNotifyDelete": "", "PushTopicForFields": "Referenced" } } } |
Implementation
Salesforce streaming API will be used for realtime plugin.
Cometd BayeuxClient is used to subscribe to events. When event is received the data is added to a blockingQueue
.
Class SalesforceReceiver extends Receiver from Spark Streaming and is used as a receiver stream.
SalesforceReceiver#onStart:
- 1. Establish connection to
instance_url
/cometd/45.0 - 2. Start thread which moves data from
blockingQueue
to Spark Stream