Introduction
A batch sink for writing to Google Cloud Storage in Avro format.
Use-case
This source is used whenever you need to write to Google Cloud Storage in Avro format. For example, you might want to create daily snapshots of a database by reading the entire contents of a table, writing to this sink, and then other programs can analyze the contents of the specified file. The output of the run will be stored in a directory with the name user customized in a specified bucket in google cloud storage.
Properties
referenceName:This will be used to uniquely identify this sink for lineage, annotating metadata, etc.
projectID: Google Cloud Project ID which has the access to a specified bucket.
jsonKey:The json certificate file of the service account used for GCS access
path: the directory inside the bucket where the data is stored. Need to be a new directory.
bucketKey: The bucket inside google cloud storage to store the data.
fileSystemProperties: JSON string representing a map of properties needed for the distributed file system.The property names needed for GCS (projectID and jsonKeyFile) will be included as 'fs.gs.project.id'
and 'google.cloud.auth.service.account.json.keyfile'
.
schema:The Avro schema of the record being written to the sink as a JSON object.
Example
This example will write to an Google Cloud Storage output located at gs://bucket/directory. It will write data in Avro format using the given schema. Every time the pipeline runs, user should specified a new directory name.
...
Code Block | ||
---|---|---|
| ||
{ "name": "GCSAvro", "plugin": { "name": "GCSAvro", "type": "batchsink", "label": "GCSAvro", "artifact": { "name": "core-plugins", "version": "1.4.0-SNAPSHOT", "scope": "SYSTEM" }, "properties": { "schema": "{ \"type\":\"record\", \"name\":\"etlSchemaBody\", \"fields\":[ {\"name\":\"ts\",\"type\":\"long\"}, {\"name\":\"body\",\"type\":\"string\"}]}", "Bucket_Key": "bucket", "path_to_store": "directory", "Project_Id": "projectid", "Json_Key_File": "path_to_jsonKeyFile", "referenceName": "name" } } } |
Requirements
- User should provide the correct project id which he has access to.
- User should provide the path to the JSON KEY FILE of the service account which has the create permission of the bucket.
- User should specify a bucket inside the google cloud storage.
- User should specify the limit time for the querying.
...