Data Science and Data Flow Integration Groups and Policies
This section covers the creation of dynamic groups, policies, and buckets that are required before using Data Science with Data Flow.
Basic Tenancy Set Up
These steps cover the minimal set-up you need to run simple workloads using the Livy
integration from Data Science Notebook Sessions.
- Data Flow
-
- Create the required buckets called
dataflow-logs
anddataflow-warehouse
in your tenancy. For more information, see the Set Up Administration section of the Data Flow documentation. - Create a dynamic group of Data Flow runs
(<df-dynamic-group>) in a specific
compartment:
ALL {resource.type='dataflowrun', resource.compartment.id='<compartment_id>'}
- Create a policy to authorize the Data Flow runs to access the
Object Storage bucket (<bucket-name>) where your data
is
located:
ALLOW DYNAMIC-GROUP < df-dynamic-group> TO MANAGE objects IN TENANCY WHERE ANY { target.bucket.name='<bucket_name>' , target.bucket.name='dataflow-logs, target.bucket.name='dataflow-warehouse' }
- Create the required buckets called
- Data Science
-
- Create a dynamic group of notebook sessions
(<ds-dynamic-group>) in a specific
compartment:
ALL {resource.type='datasciencenotebooksession', resource.compartment.id='<compartment_id>'}
- Create a policy to authorize notebook sessions to manage Data Flow
runs:
ALLOW DYNAMIC-GROUP '<ds-dynamic-group>' TO MANAGE dataflow-family in compartment '<your-compartment-name>'
- Create a dynamic group of notebook sessions
(<ds-dynamic-group>) in a specific
compartment:
Advanced Tenancy Set Up
Depending on your use case, you might need to configure access to other services on Oracle Cloud Infrastructure.
- Access Hive Metastore from a Data Flow session
- Create a dynamic group of Data Catalog Hive Metastore
(<dcat-hive-group>) in your
tenancy:
Any {resource.type = 'datacatalogmetastore'}
- Create a policy to authorize Data Flow to access the
metastore:
ALLOW DYNAMIC-GROUP '<df-dynamic-group>' TO MANAGE data-catalog-metastores IN TENANCY
- Create a policy to authorize the Data Catalog Metastores access
to the object storage buckets. Grant your dynamic group of Data Catalog Metastores access
to the buckets where the data is stored and the buckets where
the managed and external tables are
stored:
ALLOW DYNAMIC-GROUP '<dcat-hive-group>' TO READ buckets IN TENANCY ALLOW DYNAMIC-GROUP '<dcat-hive-group>' TO MANAGE object-family IN TENANCY WHERE ANY {target.bucket.name = '<bucket_name>', target.bucket.name = '<managed-table-location-bucket>', target.bucket.name = '<external-table-location-bucket>'}
- Create a dynamic group of Data Catalog Hive Metastore
(<dcat-hive-group>) in your
tenancy:
- Customize the Spark runtime environment with a published conda environment
-
- Create a bucket called
ds-conda-env
in your tenancy. - Create a policy to authorize notebook sessions to access the object
storage bucket where the conda environment is
stored:
ALLOW DYNAMIC-GROUP '<ds-dynamic-group>' TO MANAGE objects IN TENANCY WHERE ALL {target.bucket.name='ds-conda-env'}
- Create a policy to authorize Data Flow to access the object storage bucket where the conda
environment is
stored:
ALLOW DYNAMIC-GROUP '<df-dynamic-group>' TO MANAGE objects IN TENANCY WHERE ALL {target.bucket.name='ds-conda-env'}
- Create a bucket called