The salesforce batch sink is responsible for using Salesforce API to insert/upsert/update salesforce objects. The sink should handle large batches of data (~10 GB) and should handle all object types - contacts, campaigns, oppurtunities, leads, custom objects. Users should be able to upload subset of fields
...
Section | User Configuration Label | Description | Default | User Widget | Early Validations |
---|---|---|---|---|---|
Authentication | Username | Salesforce username | Text Box | Try a to login to bulk API with given credentials. | |
Password | Password | ||||
Consumer Key | Consumer Key from the connected app | Text Box | |||
Consumer Secret | Consumer Secret from the connected app | Password | |||
Login Url | For Salesforce sandbox runs login url is different. That's why user needs to have this option. | https://login.salesforce.com/services/oauth2/token | Text Box | ||
Advanced | SObject | Name of Salesforce sObject - ex: Contact, Campaign, Oppurtunity. | Text Box | Check if sObject with given name exists in Bulk API. | |
Operation | Possible values are:
| Insert | Select | If operation is upsert or insert. Validate input schema to contain id/external id fields. | |
Upsert external id field | External id field name. Used only for upsert. [5] | Text Box | If empty and operation is upsert fail. If not empty and operation is insert or update fail. | ||
Maximum bytes per batch | If size of batch data is larger than given number of bytes, split the batch. | 10,000,000 [2] | Text Box | If more than 10,000,000 than fail [3] | |
Maximum records per batch | If there are more than given number of records, split the batch. | 10,000 [2] | Text Box | If more than 10,000 fail [4] | |
Error handling | Bulk API will return success results per row so this is necessary [1] (unlike for source plugins). Possible values: "Skip on error" - ignores any reports about records not inserted. Simply prints an error log. | Skip on error | Select |
...
[4] according to Bulk API Limitations batch will fail if it has more than 10.000 records. So if user sets maximum records to something more than 10.000 we should tell the user that it does not make sense by failing the pipeline.
[5] see Upsert and update section.
Salesforce Bulk API for INSERT and how we use it.
...
Ask Bulk API to create us a job and return it's id. So we can submit batches to it later.
The job type is set to either insert, not an upsert. Since for upsert we need to know IDs for every record we upsert or update.
Anchorsplit_data split_data
STEP 3. Split data into batches
split_data | |
split_data |
...
Anchor#unknown_fields #unknown_fields
Validating schema
#unknown_fields | |
#unknown_fields |
- SObject contains a lot of fields which cannot be inserted (non-creatable fields) like Id, isDeleted, LastModifiedDate and a lot of other fields which are often auto-generated.
...
- We do the early validation and check if schema contains fields which are not present or not creatable in target sObject.
- If operation is insert, check if external id field is in schema.
- If operation is update, check if Id field is in schema.
Converting fields
We will have to convert logical types like date, datetime, time from long to string format accepted by Salesforce. Other types won't require converting.
...
It would be great to emit errors from this plugin, since it's very common for a part of records to fail to insert (but not all of them). But unfortunately SinkEmitter does not support emitting errors (codelink), also even if it did it would not help. We don't know if it's an error on transform stage. We only know that once the batches are submitted and results are checked. This happens on task finalization (RecordWritter#close)
There is
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Anchor | ||||
---|---|---|---|---|
|
Upsert - Salesforce requires user to provide an external id field name. This field is used as basis for upsert (Salesforce will decide if objects are the same using it).
Id field can be used for that (which is present for all Salesforce sObjects). Also user can create a custom field and checkmark it as "External Id" via Salesforce UI.
Update - Specifying an external id field is not supported. For updates Salesforce will always use 'Id' field as basis.