Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

...

  • Tables in the corresponding HBase namespace to create Table-based datasets
    • If you provide a custom HBase namespace when creating the namespace, it is your responsibility to ensure that every application principal can create tables in this namespace. 
      • in hbase shell: grant '<user>', 'AC', '@<namespace>'
      • or grant '@<group>', 'AC', '@<namespace>'
    • If you let CDAP create the namespace, it will use the group name specified in the namespace configuration to issue the grant '@<group>', 'AC', '@<namespace>'. In this case it is necessary that all application owners are in that group. 
  • Tables in the namespace's Hive database, to be able to enable Explore for datasets. Depending on the Hive authorization settings:
    • The application user must be privileged to create tables in the database
    • Hive must be configured to grant all privileges to the user that creates a table (depending on Hive configuration, this may not be the case)
    • For any sharing between applications that requires additional permissions, these must be granted manually.

...

  • For filesets, by default, all files and directories are created with the file system's default umask, and with the group of the parent directory. This can be overridden by dataset properties. For example, this configures read, write and execute for the owner and the group "etl":

    language
    Code Block
    java
    PartitionedFileSetProperties.builder()
      ...
      .setFilePermissions("770")
      .setFileGroup("etl")
      .build();
  • For tables, additional permissions can be granted as part of the table creation. For example, this allows read and write for the user "joe" and read only for all members of the group "etl":

    Code Block
    languagejava
    TableProperties.builder()
      ...
      .setTablePermissions(ImmutableMap.of("joe", "RW", "@etl", "R")
      .build();

    Note that this is also need for PartitionedFileSets, because their partition metadata is stored in an HBase table.

  • Explore permissions in Hive must be granted manually outside of CDAP. 

...

By default, the Explore table for a dataset is in the enclosing namespace's database and named dataset_<name>. In CDAP 4.1, you can configure a custom Hive database and table name as follows

  • java
    Code Block
    language
    PartitionedFileSetProperties.builder()
      ...
      .setExploreDatabaseName("my_database")
      .setExploreTableName("clicks_gold")
      .build();

    Note that the database name must exist as CDAP will not attempt to create it. 

...

  • FileSetProperties.setUseExisting(true) (or DATA_USE_EXISTING / "data.use.existing") to reuse an existing location and Hive table. The dataset will assume that it does not own the existing data in that location and Hive table, and therefore, when you delete or truncate the dataset, the data will not be deleted. 
  • FileSetProperties.setPossessExisting(true) (or DATA_POSSESS_EXISTING / "data.possess.existing") to assume ownership an existing location and Hive table. The dataset will assume that it owns the existing data in that location and Hive table, and therefore, when you delete or truncate the dataset, all data will be deleted, including the previously existing data and Hive partitions.  

...

Code Block
languagexml
titlehive-site.xml
<property>
	<name>hive.server2.enable.doAs</name>
	<value>false</value>
</property>
<property>
	<name>hive.users.in.admin.role</name>
	<value>hive,cdap</value>
</property>
<property>
	<name>hive.security.authorization.manager</name>
	<value>org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactory</value>
</property>
<property>
	<name>hive.security.authorization.enabled</name>
	<value>true</value>
</property>
<property>
	<name>hive.security.authenticator.manager</name>
	<value>org.apache.hadoop.hive.ql.security.ProxyUserAuthenticator</value>
</property>

...

Code Block
languagexml
titlehive-site.xml
<property>
	<name>hive.security.authorization.sqlstd.confwhitelist.append</name>
	<value>explore.*|mapreduce.job.queuename|mapreduce.job.complete.cancel.delegation.tokens|spark.hadoop.mapreduce.job.complete.cancel.delegation.tokens|mapreduce.job.credentials.binary|hive.exec.submit.local.task.via.child|hive.exec.submitviachild<.submitviachild|hive.lock.*</value>
</property>

Hive Proxy Users

To enable Hive If you do not use SQL-based authorization, you may want to configure Hive to be able to impersonate other users set . Set the following in hive-site.xml

Code Block
languagexml
titlehive-site.xml
<property>
	<name>hive.server2.enable.doAs</name>
	<value>true</value>
</property>


Make sure that Hive is configured Note that CDAP's Explore service ignores this setting and needs to be able to impersonate users who can create/access entities in CDAP. This can by done by adding the following property in your core-site.xml. The first option allows Hive CDAP to impersonate users belonging to "group1" and "group2" and the second option allows Hive to impersonate on all hosts.

...