Data Catalog Metastore

Apache Hive is a data warehousing framework, which facilitates read, write, or manage operations on large datasets that reside in distributed systems.

Data Catalog Metastore provides a highly available and scalable central repository of metadata for a Hive cluster. It stores metadata for data structures such as databases, tables, and partitions in a relational database, backed by files maintained in Object Storage. Apache Spark SQL makes use of Data Catalog Metastore for this purpose.

The OCI Data Flow, OCI Big Data Service, OCI Data Science service instances access the Data Catalog Metastore to securely store and retrieve schema definitions for the objects in unstructured and semi-structured data assets, such as Object Storage.

Prerequisites

Before you create a metastore in Data Catalog, you must create two buckets in Oracle Object Storage to contain the Managed and External tables.
  • Managed Table: The Metastore manages the table object.
  • External Table: As a user, you manage the table object.

While creating the metastore in Data Catalog, you provide the URIs of the buckets in which the managed and external tables reside. The format of the URI must be oci://<bucket_name>@<namespace_name>/<folder name of your choice>.

Note

We recommend that you do not use the same location for managed and external tables. If both types of tables are in the same directory, deletion of data from managed tables could result in loss of data from external tables as well.

For more information, see HDFS connector.

Required IAM Policies

You must add policies to allow Metastore Resource Principal access to storage locations.

As a prerequisite, create a dynamic group that includes the metastore. In the following policy statements, its OCID is represented by <dg-metastore-ocid>:
ALLOW dynamic-group <dg-metastore-ocid> to read buckets in tenancy where any {all {target.bucket.name='<managed-table-location-bucket>', request.region='<managed-table-location-bucket-region>'}, all {target.bucket.name='<external-table-location-bucket>', request.region='<external-table-location-bucket-region>'}}
ALLOW dynamic-group <dg-metastore-ocid> to manage objects in tenancy where all {target.bucket.name='<managed-table-location-bucket>', request.region='<managed-table-location-bucket-region>'}
ALLOW dynamic-group <dg-metastore-ocid> to read objects in tenancy where all {target.bucket.name='<external-table-location-bucket>', request.region='<external-table-location-bucket-region>'}
Note

If you want to allow Metastore to dynamically create buckets per database using the spark.conf parameter setting oci.dcat.metastore.create.bucket.per.db = true, add the following policies:
ALLOW dynamic-group <dg-metastore-ocid> to manage buckets in tenancy where any {all {target.bucket.name='<managed-table-location-bucket>', request.region='<managed-table-location-bucket-region>'})
ALLOW dynamic-group <dg-metastore-ocid> to read buckets in tenancy where any {all {target.bucket.name='<external-table-location-bucket>', request.region='<external-table-location-bucket-region>'}}

Only an Admin and the users in the Administrators group have access to all Data Catalog metastore resources. As an Admin, you can use the following policies to provide access to all metastore resources.

allow <any-user> to manage data-catalog-metastore-assets in compartment <compartment-name>
allow group <any-group> to manage data-catalog-metastore-assets in compartment <compartment-name>

These policies are mandatory, unless coarse-grained policies are defined for catalog, database, or table-level resources. For more information, see Required IAM Policies for Coarse-grained Access Control.

Coarse-grained Access Control in Data Catalog Metastore

Data Catalog Metastore provides coarse-grained access control using Identity and Access Management service to avoid accidental access and modification of resources created by another user. As an admin, you can grant access to resources such as catalogs, databases, and tables using predefined policies mentioned in the Resources List on metastore details page. For more information about viewing metastore resources, see Viewing Metastore Resources List.

Note

By default, coarse-grained access control is disabled so you must define the required policies to access all metastore resources. See Required IAM Policies. After defining these policies, set the Spark.Conf parameter to true (oci.dcat.metastore.enable.cgac = true).

Required IAM Policies for Coarse-grained Access Control

Use the following policies to provide access to catalog, database, or table-level resources:

Note

  • Applying policies at database or table-level requires permission to manage <hive-default>.
  • When policies are applied at database or table-level, the users in the group can't see or list the databases and tables created by another user.

Allow access to catalog-level resources:

  • Using target.metastore.catalog.name
    allow group <any-group> to manage data-catalog-metastore-assets in compartment <compartment-name> where all {target.metastore.id = '<metastore-id>', target.metastore.catalog.name = '<metastore-catalog-name>'}
  • Using target.metastore.catalog.key
    allow group <any-group> to manage data-catalog-metastore-assets in compartment <compartment-name> where all {target.metastore.id = '<metastore-id>', target.metastore.catalog.key = '<metastore-catalog-key>'}
    Note

    The name, key, and sample policy statement are available on the metastore page under Resources section.
    When creating a policy for a new catalog, you must also create a policy for the default database created as part of that catalog.
    allow group <any-group> to manage data-catalog-metastore-assets in compartment <compartment-name> where all {target.metastore.id = '<metastore-id>', target.metastore.catalog.name = '<metastore-catalog-name>'}
    allow group <any-group> to manage data-catalog-metastore-assets in compartment <compartment-name> where all {target.metastore.id = '<metastore-id>', target.metastore.database.name = <metastore-catalog-name.default>}

Allow a <specific-group> to access database or table-level resources for a specific catalog, where user has access to the child resources of that catalog:

  • Using target.metastore.catalog.name
    allow group <specific-group> to manage data-catalog-metastore-assets in compartment <compartment-name> where all {target.metastore.id = '<metastore-id>', target.metastore.database.name = '<metastore-catalog-name.default>'}
    allow group <specific-group> to manage data-catalog-metastore-assets in compartment <compartment-name> where all {target.metastore.id = '<metastore-id>', target.metastore.catalog.name = '<metastore-catalog-name>'}
    allow group <specific-group> to manage data-catalog-metastore-assets in compartment <compartment-name> where all {target.metastore.id = '<metastore-id>', target.metastore.database.name = '*'}
    allow group <specific-group> to manage data-catalog-metastore-assets in compartment <compartment-name> where all {target.metastore.id = '<metastore-id>', target.metastore.table.name = '*'}
  • Using target.metastore.catalog.key
    allow group <specific-group> to manage data-catalog-metastore-assets in compartment <compartment-name> where all {target.metastore.id = '<metastore-id>', target.metastore.catalog.key = '<catalog-ID>'}
    allow group <specific-group> to manage data-catalog-metastore-assets in compartment <compartment-name> where all {target.metastore.id = '<metastore-id>', target.metastore.database.name = <metastore-catalog-name.default>}
    allow group <specific-group> to manage data-catalog-metastore-assets in compartment <compartment-name> where all {target.metastore.id = '<metastore-id>', target.metastore.database.key = '*'}
    allow group <specific-group> to manage data-catalog-metastore-assets in compartment <compartment-name> where all {target.metastore.id = '<metastore-id>', target.metastore.table.key = '*'}

Allow a <specific-group> to access database-level resources for a specific catalog by applying one of the following policies:

  • Using target.metastore.database.name
    allow group <specific-group> to manage data-catalog-metastore-assets in compartment <compartment-name> where all {target.metastore.id = '<metastore-id>', target.metastore.catalog.name = '<metastore-catalog-name>'}
    allow group <specific-group> to manage data-catalog-metastore-assets in compartment <compartment-name> where all {target.metastore.id = '<metastore-id>', target.metastore.database.name = '<catalog-name>.<database-name>'}
  • Using target.metastore.database.key
    allow group <specific-group> to manage data-catalog-metastore-assets in compartment <compartment-name> where all {target.metastore.id = '<metastore-id>', target.metastore.database.key = '<metastore-database-ID>'}

Allow a <specific-group> to access table-level resources for a specific database by applying one of the following policies:

  • Using target.metastore.table.name
    allow group <any-group> to manage data-catalog-metastore-assets in compartment <compartment-name> where all {target.metastore.id = '<metastore-id>', target.metastore.catalog.name = '<metastore-catalog-name>'}
    allow group <specific-group> to manage data-catalog-metastore-assets in compartment <compartment-name> where all {target.metastore.id = '<metastore-id>', target.metastore.database.name = '<catalog-name>.<database-name>'}
    allow group <specific-group> to manage data-catalog-metastore-assets in compartment <compartment-name> where all {target.metastore.id = '<metastore-id>', target.metastore.table.name = '<catalog-name>.<database-name>.<table-name>'}
  • Using target.metastore.table.key
    allow group <specific-group> to manage data-catalog-metastore-assets in compartment <compartment-name> where all {target.metastore.id = '<metastore-id>', target.metastore.table.key = '<metastore-table-ID>'}