Table of Contents |
---|
...
For filesets, by default, all files and directories are created with the file system's default umask, and with the group of the parent directory. This can be overridden by dataset properties. For example, this configures read, write and execute for the owner and the group "etl":
Code Block language java PartitionedFileSetProperties.builder() ... .setFilePermissions("770") .setFileGroup("etl") .build();
For tables, additional permissions can be granted as part of the table creation. For example, this allows read and write for the user "joe" and read only for all members of the group "etl":
Code Block language java TableProperties.builder() ... .setTablePermissions(ImmutableMap.of("joe", "RW", "@etl", "R") .build();
Note that this is also need for PartitionedFileSets, because their partition metadata is stored in an HBase table.
- Explore permissions in Hive must be granted manually outside of CDAP.
...
By default, the Explore table for a dataset is in the enclosing namespace's database and named dataset_<name>
. In CDAP 4.1, you can configure a custom Hive database and table name as follows
Code Block language java PartitionedFileSetProperties.builder() ... .setExploreDatabaseName("my_database") .setExploreTableName("clicks_gold") .build();
Note that the database name must exist as CDAP will not attempt to create it.
...
- FileSetProperties.setUseExisting(true) (or DATA_USE_EXISTING / "data.use.existing") to reuse an existing location and Hive table. The dataset will assume that it does not own the existing data in that location and Hive table, and therefore, when you delete or truncate the dataset, the data will not be deleted.
- FileSetProperties.setPossessExisting(true) (or DATA_POSSESS_EXISTING / "data.possess.existing") to assume ownership an existing location and Hive table. The dataset will assume that it owns the existing data in that location and Hive table, and therefore, when you delete or truncate the dataset, all data will be deleted, including the previously existing data and Hive partitions.
...
Add the following to your hbase-site.xml
Code Block | ||||
---|---|---|---|---|
| ||||
<property> <name>hbase.security.exec.permission.checks</name> <value>true</value> </property> <property> <name>hbase.coprocessor.master.classes</name> <value>org.apache.hadoop.hbase.security.access.AccessController</value> </property> <property> <name>hbase.coprocessor.region.classes</name> <value>org.apache.hadoop.hbase.security.token.TokenProvider,org.apache.hadoop.hbase.security.access.AccessController</value> </property> |
...
- All keytabs must be present on the local filesystem on which CDAP Master is running.
- These keytabs must be present under a path which can be in one of the following formats and cdap should have read access on all the keytabs:
- /dir1>/<dir2>/${name}.keytab
- /dir1>/<dir2>/${name}/${name}.keytab
The above path is provided to CDAP as a configuration parameter in cdap-site.xml for example:
Code Block language xml title cdap-site.xml <property> <name>security.keytab.path</name> <value>/etc/security/keytabs/${name}.keytab</value> </property>
Where ${name} will be replaced by CDAP by the short user name of the kerberos principal CDAP is impersonating.
Note: You will need to restart CDAP for the configuration changes to take effect.
...
Add the following to your hive-site.xml and restart hive:
Code Block | ||||
---|---|---|---|---|
| ||||
<property> <name>hive.server2.enable.doAs</name> <value>false</value> </property> <property> <name>hive.users.in.admin.role</name> <value>hive,cdap</value> </property> <property> <name>hive.security.authorization.manager</name> <value>org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactory</value> </property> <property> <name>hive.security.authorization.enabled</name> <value>true</value> </property> <property> <name>hive.security.authenticator.manager</name> <value>org.apache.hadoop.hive.ql.security.ProxyUserAuthenticator</value> </property> |
...
Note your hive-site.xml should also be configured to support modifying properties at runtime. Specifically, you will need the following configuration in your hive-site.xml
Code Block | ||||
---|---|---|---|---|
| ||||
<property> <name>hive.security.authorization.sqlstd.confwhitelist.append</name> <value>explore.*|mapreduce.job.queuename|mapreduce.job.complete.cancel.delegation.tokens|spark.hadoop.mapreduce.job.complete.cancel.delegation.tokens|mapreduce.job.credentials.binary|hive.exec.submit.local.task.via.child|hive.exec.submitviachild</value> </property> |
...
Make sure that Hive is configured to impersonate users who can create/access entities in CDAP. This can by done by adding the following property in your core-site.xml. The first option allows Hive to impersonate users belonging to "group1" and "group2" and the second option allows Hive to impersonate on all hosts.
Code Block | ||||
---|---|---|---|---|
| ||||
<property> <name>hadoop.proxyuser.hive.groups</name> <value>group1,group2</value> </property> <property> <name>hadoop.proxyuser.hive.hosts</name> <value>*</value> </property> |
...
Creating application from an existing artifact:
Code Block | ||||
---|---|---|---|---|
| ||||
curl -v -X PUT http://hostname.net:11015/v3/namespaces/{namespace-id}/apps/{app-id} -d '{"artifact":{"name":"{artifact-name}","version":"{artifact-version}","scope":"USER"},"principal":"someuser/somehost.net@SOMEKDC.NET"}' -H "Authorization: Bearer your_access_token" |
...
Creating a stream with an owner:
Code Block | ||||
---|---|---|---|---|
| ||||
curl -X PUT -v http://somehost.net:11015/v3/namespaces/{namespace-id}/streams/{stream-name} -d '{ "ttl": 1, "principal": "someuser/somehost.net@SOMEKDC.NET" }' -H "Authorization: Bearer your_access_token" |
...
Creating a dataset with owner:
Code Block | ||||
---|---|---|---|---|
| ||||
curl -v -X PUT http://somehost.net:11015/v3/namespaces/{namespace-id}/data/datasets/{dataset-id} -d '{ "typeName": "table", "properties": {}, "principal": "someuser/somehost.net@SOMEKDC.NET" }' -H "Authorization: Bearer your_access_token" |
Querying dataset properties for owner information:
Code Block | ||||
---|---|---|---|---|
| ||||
curl -v http://hostname.net:11015/v3/namespaces/{namespace-id}/data/datasets/{dataset-name} -H "Authorization: Bearer your_access_token" |
...