Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

...

  • FileSetProperties.setUseExisting(true) (or DATA_USE_EXISTING / "data.use.existing") to reuse an existing location and Hive table. The dataset will assume that it does not own the existing data in that location and Hive table, and therefore, when you delete or truncate the dataset, the data will not be deleted. 
  • FileSetProperties.setPossessExisting(true) (or DATA_POSSESS_EXISTING / "data.possess.existing") to assume ownership an existing location and Hive table. The dataset will assume that it owns the existing data in that location and Hive table, and therefore, when you delete or truncate the dataset, all data will be deleted, including the previously existing data and Hive partitions.  

...

To support app level impersonation wherein applicationapplications, datasets and streams can have their own owner and the operation operations performed in CDAP should impersonated their impersonate their respective owner owners, CDAP should have access to the owner principal and their associated keytabs. Owner principals principal of an entity is provided during the entity creation step (see REST APIs documentation in next section).

...

Code Block
titlehive-site.xml
<property>
	<name>hive.security.authorization.sqlstd.confwhitelist.append</name>
	<value>explore.*|mapreduce.job.queuename|mapreduce.job.complete.cancel.delegation.tokens|spark.hadoop.mapreduce.job.complete.cancel.delegation.tokens|mapreduce.job.credentials.binary|hive.exec.submit.local.task.via.child|hive.exec.submitviachild</value>
</property>

Hive Proxy Users


To enable Hive to impersonate other users set the following in hive-site.xml

Code Block
languagexml
titlehive-site.xml
<property>
	<name>hive.server2.enable.doAs</name>
	<value>true</value>
</property>


Make sure that Hive is configured to impersonate users who can create/access entities in CDAP. This can by done by adding the following property in your core-site.xml. The first option allows Hive to impersonate users belonging to "group1" and "group2" and the second option allows Hive to impersonate on all hosts.

Code Block
titlecore-site.xml
<property>
	<name>hadoop.proxyuser.hive.groups</name>
	<value>group1,group2</value>
</property>

<property>
	<name>hadoop.proxyuser.hive.hosts</name>
	<value>*</value>
</property>

...

CDAP Authorization (if needed):

Additionally, you might want to enable CDAP authorization. For details on how to enable authorization in CDAP and manage privileges please refer to our documentation here: http://docs.cask.co/cdap/current/en/admin-manual/security/authorization.html?highlight=authorization

Note

Please note that this the above cluster configuration is not a comprehensive guide for enabling authorization and/or impersonation on Hadoop cluster. You might need to add/remove configuration depending on your environment.

...

Code Block
titlecreating namespace from cli
create namespace testns principal rsinha/appimpmn18192-1000.dev.continuuity.net@CONTINUUITY.NET<host-name>@<realm> group-name deployers keytab-URI /etc/security/keytabs/rsinha.keytab

Application Lifecycle 

Loading an

...

artifact:

Code Block
titleloading artifact from cli
load artifact SportResults-4.1.0-SNAPSHOT.jar 

...

Existing REST API. Please see: http://docs.cask.co/cdap/current/en/reference-manual/http-restful-api/lifecycle.html#details-of-a-deployed-application

Streams

Creating a stream with an owner:

Code Block
titlecreating stream REST API
curl -X PUT -v http://somehost.net:11015/v3/namespaces/{namespace-id}/streams/{stream-name} -d '{ "ttl": 1, "principal": "someuser/somehost.net@SOMEKDC.NET" }' -H "Authorization: Bearer your_access_token"

...

Existing REST API. Please see: http://docs.cask.co/cdap/current/en/reference-manual/http-restful-api/stream.html#getting-and-setting-stream-properties

Datasets

Creating

...

a dataset with owner:

Code Block
titlecreating dataset REST API
curl -v -X PUT http://somehost.net:11015/v3/namespaces/{namespace-id}/data/datasets/{dataset-id} -d '{ "typeName": "table", "properties": {}, "principal": "someuser/somehost.net@SOMEKDC.NET" }' -H "Authorization: Bearer your_access_token"

...