Data Science and Data Flow Integration Groups and Policies

This section covers the creation of dynamic groups, policies, and buckets that are required before using Data Science with Data Flow.

Basic Tenancy Set Up

These steps cover the minimal set-up you need to run simple workloads using the Livy integration from Data Science Notebook Sessions.
Data Flow
  • Create the required buckets called dataflow-logs and dataflow-warehouse in your tenancy. For more information, see the Set Up Administration section of the Data Flow documentation.
  • Create a dynamic group of Data Flow runs (<df-dynamic-group>) in a specific compartment:
    ALL {resource.type='dataflowrun', resource.compartment.id='<compartment_id>'}
  • Create a policy to authorize the Data Flow runs to access the Object Storage bucket (<bucket-name>) where your data is located:
    ALLOW DYNAMIC-GROUP <
    df-dynamic-group> TO MANAGE objects IN TENANCY WHERE ANY {
     target.bucket.name='<bucket_name>'
    , target.bucket.name='dataflow-logs, target.bucket.name='dataflow-warehouse'
    }
Data Science
  • Create a dynamic group of notebook sessions (<ds-dynamic-group>) in a specific compartment:
    ALL {resource.type='datasciencenotebooksession', resource.compartment.id='<compartment_id>'}
  • Create a policy to authorize notebook sessions to manage Data Flow runs:
    ALLOW DYNAMIC-GROUP '<ds-dynamic-group>' TO MANAGE dataflow-family in compartment '<your-compartment-name>'

Advanced Tenancy Set Up

Depending on your use case, you might need to configure access to other services on Oracle Cloud Infrastructure.

Access Hive Metastore from a Data Flow session
  • Create a dynamic group of Data Catalog Hive Metastore (<dcat-hive-group>) in your tenancy:
    Any {resource.type = 'datacatalogmetastore'}
  • Create a policy to authorize Data Flow to access the metastore:
    ALLOW DYNAMIC-GROUP '<df-dynamic-group>' TO MANAGE data-catalog-metastores IN TENANCY
  • Create a policy to authorize the Data Catalog Metastores access to the object storage buckets. Grant your dynamic group of Data Catalog Metastores access to the buckets where the data is stored and the buckets where the managed and external tables are stored:
    ALLOW DYNAMIC-GROUP '<dcat-hive-group>' TO READ buckets IN TENANCY
    ALLOW DYNAMIC-GROUP '<dcat-hive-group>' TO MANAGE object-family IN TENANCY WHERE ANY {target.bucket.name = '<bucket_name>', target.bucket.name = '<managed-table-location-bucket>', target.bucket.name = '<external-table-location-bucket>'}
For more general policies to use Hive Metastore with Data Flow, see Hive Metastore Policies in the Data Flow documentation.
Customize the Spark runtime environment with a published conda environment
  • Create a bucket called ds-conda-env in your tenancy.
  • Create a policy to authorize notebook sessions to access the object storage bucket where the conda environment is stored:
    ALLOW DYNAMIC-GROUP '<ds-dynamic-group>' TO MANAGE objects IN TENANCY WHERE ALL {target.bucket.name='ds-conda-env'}
  • Create a policy to authorize Data Flow to access the object storage bucket where the conda environment is stored:
    ALLOW DYNAMIC-GROUP '<df-dynamic-group>' TO MANAGE objects IN TENANCY WHERE ALL {target.bucket.name='ds-conda-env'}

Was this article helpful?