Salesforce Batch Source: Bulk API limitations handling


Introduction

Salesforce Batch Source plugin uses Bulk API by default. Bulk API can not be used with some types of SOQL queries and compound fields. To avoid such limitations replacing of unsupported field types with supported equivalent or by switching to SOAP API options can be used.

Use case

LimitationUse CaseApproach

Compound fields

Support compound field types: address, location.

For example, Account SObject contains two compound fields: BillingAddress, ShippingAddress.

User should be able to execute SOQL which includes such fields:

SELECT ShippingAddress, BillingAddress FROM Account

Implement compound fields processing strategy.

EXCLUDE - compound fields will be excluded from SOQL. Processing type: Bulk API. Default strategy.

FLATTEN - compound fields will be replaced with individual fields. Processing type: Bulk API.

Example: BillingAddress field will be replaced with:

BillingStreet, BillingCity, BillingState, BillingPostalCode, BillingCountry, BillingLatitude, BillingLongitude, BillingGeocodeAccuracy

INCLUDE - compound field will be fetched in JSON format. Processing type: Bulk API + SOAP API. Note: performance may be impacted.

Example:

{
"city": "San Francisco",
"country": "US",
"countryCode": null,
"geocodeAccuracy": null,
"latitude": null,
"longitude": null,
"postalCode": "94087",
"state": "CA",
"stateCode": null,
"street": "The Landmark @ One Market"
}

More details: https://developer.salesforce.com/docs/atlas.en-us.api.meta/api/compound_fields.htm

OFFSET clause

User should be able to execute SOQL with offset:

SELECT Name, Id FROM Opportunity OFFSET 30

Adapt existing WideObject query processing to query all SOQL queries with OFFSET clause. Processing type: Bulk API + SOAP API.

Note: performance may be impacted.


Aggregate functions

Functions: COUNT, SUM

User should be able to execute SOQL with aggregate functions:

SELECT Id,count(Name) cnt,count(Name),sum(Amount),count(CloseDate) FROM Opportunity GROUP BY Id


New SOAP API query() reader should be implemented.

Limitations:

  1. Query cannot be processed in batches. All work will be done by one mapper. Performance can be significantly impacted.
  2. Wide queries are not supported, i.e. if user provides query that exceeds the 20,000 character limit, switch to WideObject query processing is not possible, pipeline will fail.
Aggregate Relationships

Relationships: ROLLUP, GROUP BY CUBE

User should be able to execute SOQL with aggregate relationships:

SELECT Status, LeadSource, COUNT(Name) cnt FROM Lead GROUP BY ROLLUP(Status, LeadSource)

SELECT Type, BillingCountry, GROUPING(Type) grpType, GROUPING(BillingCountry) grpCty, COUNT(id) accts FROM Account GROUP BY CUBE(Type, BillingCountry) ORDER BY GROUPING(Type), GROUPING(BillingCountry)

New SOAP API query() reader should be implemented.

Limitations:

  1. Query cannot be processed in batches. All work will be done by one mapper. Performance can be significantly impacted.
  2. Wide queries are not supported, i.e. if user provides query that exceeds the 20,000 character limit, switch to WideObject query processing is not possible, pipeline will fail.
Nested SOQL queries

User should be able to execute SOQL with nested SOQL query:

SELECT Id,Account.Name, (SELECT Id,Name,Contact.LastName FROM Account.Contacts) FROM Account

New SOAP API query() reader should be implemented.

Limitations:

  1. Query cannot be processed in batches. All work will be done by one mapper. Performance can be significantly impacted.
  2. Wide queries are not supported, i.e. if user provides query that exceeds the 20,000 character limit, switch to WideObject query processing is not possible, pipeline will fail.

Plugin Type

This is not a new plugin, but enhancements to the existing plugin.

Design

Approach(s)

Aggregate queries

1. SOQL aggregate functions will not be allowed without alias in order to generate proper CDAP schema.

Though Salesforce allows such queries it gives generated name for fields without alias (ex: expr0, expr1 depending on the position in the query) but we don't know which this information until we start processing the results of the query, plus such naming is not useful for further result processing in the pipeline. That's why we will enforce user to add aliases. In the case when alias is not present, pipeline will fail during validation stage, informing user that he needs to indicate alias.

Example of valid query: SELECT MIN(closedate) Amt FROM Opportunity

Example of invalid query: SELECT MIN(closedate) FROM Opportunity 

https://developer.salesforce.com/docs/atlas.en-us.soql_sosl.meta/soql_sosl/sforce_api_calls_soql_select_groupby_alias.htm

2. Schema type for the field with aggregate function will be determined based on function type.

Note: in Salesforce date functions (ex: CALENDAR_MONTH) are used only as aggregate functions.

List of functions supported by Salesforce:

Identify functions (functions that have the same type as field over which they are applied): MAX, MIN.

Numeric functions (schema type LONG):

A. AVG, SUM, COUNT(), COUNT(fieldName), COUNT_DISTINCT, GROUPING

B. CALENDAR_MONTH, CALENDAR_QUARTER, CALENDAR_YEAR, DAY_IN_MONTH, DAY_IN_WEEK, DAY_IN_YEAR, FISCAL_MONTH, FISCAL_QUARTER, FISCAL_YEAR, HOUR_IN_DAY, WEEK_IN_MONTH, WEEK_IN_YEAR.

Date functions (schema type DATE): DAY_ONLY.

https://developer.salesforce.com/docs/atlas.en-us.soql_sosl.meta/soql_sosl/sforce_api_calls_soql_select_agg_functions.htm

https://developer.salesforce.com/docs/atlas.en-us.soql_sosl.meta/soql_sosl/sforce_api_calls_soql_select_groupby_grouping.htm

https://developer.salesforce.com/docs/atlas.en-us.soql_sosl.meta/soql_sosl/sforce_api_calls_soql_select_date_functions.htm

Properties

Security

Limitation(s)

Future Work

  • Some future work – HYDRATOR-99999
  • Another future work – HYDRATOR-99999

Test Case(s)

  • Test case #1
  • Test case #2

Sample Pipeline

Please attach one or more sample pipeline(s) and associated data. 

Pipeline #1

Pipeline #2



Table of Contents

Checklist

  • User stories documented 
  • User stories reviewed 
  • Design documented 
  • Design reviewed 
  • Feature merged 
  • Examples and guides 
  • Integration tests 
  • Documentation for feature 
  • Short video demonstrating the feature