...
Code Block |
---|
{ "config": { "source": { "name": "tweets", "plugin": { "name": "Twitter", "properties": { ... } "artifact": { "name": "etl-lib", "version": "3.3.0", "scope": "system" } }, "properties": { ... } }, ... } } |
plugin artifact information will be optional. If none is given, the most recent artifact will be chosen. Artifact fields are also optional. It will go through artifacts until it finds the first one that matches all fields given.
...
User wants to read from Kinesis source (AWS) and persist it into a dataset, as data in AWS kinesis stream is available only for 24hrs.
Code Block "config": { "source": { "name": "KinesisMyKinesisSource", "propertiesplugin": { "name": "Kinesis", "credentialsproperties": "...",{ "credentials": "...", "shardId" : "id442222", "limit" : 5MBps // we can skip this if we use KCL rather than the REST API (have to investigate) } } .... } }
Kafka sink:
- User might want to move from existing streaming platform like kinesis to Apache Kafka?
...
- User might want to read from an HTTP Endpoint exposed by services which provide public data, like YouTube, Foursquare, AWS API,etc instead of implementing separate clients for these.
Ingest data from existing system that serves data via HTTP API.
Code Block "config": { "source": { "name": "ExampleHTTPSource", "plugin": { "name": "Http", "properties": { "method": "GET", "url": "http://example.com", "headers": "{...}", "body": "file-location"location" } } .... } }
FTP Source:
Code Block |
---|
"config": { "source": { "name": "FTPExampleSource", "plugin": { "name": "Ftp", "properties": { "hostname": "example.com", "port" : "4433", "credentials": "...", //optional configs "page-size" : 25, "keep-alive" : 5m, // there are few parameters to set if the remote server is in different time-zone or uses non-english data,etc "language-code" : "..", "time-zone" : "..." } } .... } } |
Counting Dataset:
...