S3 IAM Roles Authentication and server side encryption
- Russ Savage
- Shashank
Introduction
An option of IAM role based authentication in the existing S3 source and sink plugins.An Option of server side encryption in the existing S3 sink plugins.
Use case(s)
- In the S3 source and S3 sink(Avro and Parquet) plugins,there should be a provision for user to select authentication mechanism for S3.User should have an option to select IAM role based authentication in the plugins.
- In S3 sink(Avro and Parquet) plugins,there should be a provision for user to enable server side encryption on S3.
User Storie(s)
- As a pipeline user,i want to have an option of IAM role based authentication in the S3 source and sink plugins in Hydrator.
- As a pipeline user,i want access ID and access key to be mandatory for Access Credentials authentication method.
- As a pipeline user,i want to have an option for enabling server side encryption in S3 sink plugins(Avro and Parquet) in Hydrator.
Plugin Type
- Batch Source
- Batch SinkÂ
- Real-time Source
- Real-time Sink
- Action
- Post-Run Action
- Aggregate
- Join
- Spark Model
- Spark Compute
Configurables
New Configuration would be added in the S3 plugin
User Facing Name | Type | Description | Constraints |
---|---|---|---|
Authentication Method | Select | Authentication method to access S3. Defaults to Access Credentials. | Â |
Server Side Encryption | Select | Server side encryption. Defaults to True. | Â |
Design
Authentication:
{
"widget-type": "select",
"label": "Authentication Method",
"name": "authenticationMethod",
"widget-attributes": {
"values": [
"Access Credentials",
"IAM"
],
"default": "Access Credentials"
}
}
Â
Server side encryption:
{
"widget-type": "select",
"label": "Server Side Encryption",
"name": "enableEncryption",
"widget-attributes": {
"values": [
"True",
"False"
],
"default": "True"
}
},
Approach(s)
1.When user selected IAM role based authentication method,need to omit the properties related to keys.
2.When user selects IAM based authentication and enables server side encryption,then fs.s3a.server-side-encryption-algorithm would be set to AES256(This is the only supported value.)
3.When user selects Access Credentials authentication and enables server side encryption,then fs.s3n.server-side-encryption-algorithm would be set to AES256(This is the only supported value.)
References:
https://issues.apache.org/jira/browse/HADOOP-10568
https://hortonworks.github.io/hdp-aws/s3-encryption/index.html
https://issues.apache.org/jira/browse/HADOOP-13131
If the file is encrypted,same could be seen in the details section of the file in AWS console.
Limitation(s)
1.For all the S3 plugins, S3 regions which are supporting both the signature versions(Version 2 and Version 4) are only supported.
2.User need to have AWS environment only to use IAM role based authentication.Non-EC2 environment can not be used.
3.User would have to use s3a hadoop client only to use IAM authentication.(URI scheme: s3a://)
Future Work
- Some future work – HYDRATOR-99999
- Another future work – HYDRATOR-99999
Test Case(s)
- S3batch source with IAM role based authentication
- S3batchsource with key credentials
- S3Avrosink with IAM role based authentication
- S3AvroSink with key credentials
- S3ParquetSink with IAM role based authentication
- S3ParquetSink with key credentials
Sample Pipeline
S3SourceCredentials.jsonS3SinkAvroIAM-cdap-data-pipeline.json
S3SinkAvroCredentials-cdap-data-pipeline.json
S3SinkParquetIAM_1-cdap-data-pipeline.json
S3SinkParquetCredentials1-cdap-data-pipeline.json
Â
Â
Table of Contents
Checklist
- User stories documentedÂ
- User stories reviewedÂ
- Design documentedÂ
- Design reviewedÂ
- Feature mergedÂ
- Examples and guidesÂ
- Integration testsÂ
- Documentation for featureÂ
- Short video demonstrating the feature