In some instances, the results of a pipeline may need to be posted to an external webservice. For example, you could have a processing pipeline that would send messages to Slack via the rest endpoint or you may want to send notifications to a 3rd party website. This sink would send the messages from the pipeline to an external http endpoint.
Use case(s)
I would like to post a notification to Slack every time a user from my website sees a 500 error. I would like to set up a realtime spark streaming pipeline to read my weblog data, filter for messages that have a 500 error, and post a custom message to slack with details from the message such as the url.
I am leveraging a 3rd party reporting tool for updating metrics in a dashboard. I would like to create a realtime pipeline to generate those metrics and post them to the 3rd party reporting service. I would like to use a realtime spark streaming pipeline, configure windows for aggregations, then send those aggregated stats to the 3rd party using this HTTP Sink.
User Storie(s)
As a pipeline developer, i would like to post data to an external webservice by providing the request method (GET, POST, PUT, DELETE), url, payload (If POST or PUT), request headers, timeouts.
As a pipeline developer, i would like to be able to define a custom POST payload leveraging fields from the message.
As a pipeline developer, I would like to batch my updates if required, so that it would post to the external service only when n number of messages has been sent.
As a pipeline developer, I would like the plugin to retry an configurable amount of time before failing the pipeline
As a pipeline developer, I would like to be able to send basic auth credentials by providing a username and password in the config
As a pipeline developer, I would like to be able to send to http and https endpoints.
Plugin Type
BatchSink
Configurables
This section defines properties that are configurable for this plugin.
User Facing Name
Type
Description
Constraints
Macro Enabled?
URL
String
Required. The URL to post data to.
yes
Request Method
Select
The HTTP request method.
GET, POST, PUT, DELETE
Batch Size
String
The number of messages to batch before sending
> 0, default 1 (no batching)
yes
Format
Select
The format to send the message in. JSON will format the entire input record to json and send it as a payload. Form will convert the input message to a query string and send it in the payload. Custom will leverage the request body field to send.
JSON, Form, Custom
Request Body
String
Optional request body. Only required if Custom format is specified.
yes
Content Type
String
Used to specify the Content-Type header.
yes
Request Headers
KeyValue
An optional string of header values to send in each request where the keys and values are delimited by a colon (":") and each pair is delimited by a newline ("\n").
yes
Should Follow Redirects?
Select
Whether to automatically follow redirects. Defaults to true.
true,false
Number of Retries
Select
The number of times the request should be retried if the request fails. Defaults to 3.
0,1,2,3,4,5,6,7,8,9,10
Connect Timeout
String
The time in milliseconds to wait for a connection. Set to 0 for infinite. Defaults to 60000 (1 minute).
Read Timeout
String
The time in milliseconds to wait for a read. Set to 0 for infinite. Defaults to 60000 (1 minute).
Design / Implementation Tips
Please use HTTPPoller and HTTPCallback in Hydrator plugins as a reference.
If a user selects json, the content-type header should be set to application/json. Form should be set to application/x-www-form-urlencoded.
When formatting the message as a query string, don't forget to urlencode the values
We will need to define some sort of macro language so that the user can leverage message fields in their post payload. For example, i might define my payload as \{ "messageType" : "update", "name" : "%{firstName}" \} where %{firstName} will be substituted for the value that is in firstName in the incoming message.
For Batching, each message will be sent separated by a newline (\n) character
Design
Approach(s)
Properties
NFR
1.If user enables SSL validation, they will be expected to add the certificate to the truststore of each machine.