Manually Configuring a Data Science Tenancy
In this tutorial, you set up your tenancy for Data Science and test it by creating a notebook session.
This tutorial is directed at administrator users because they are granted the required access permissions.
In this tutorial, you are:
1. Creating a Data Scientists User Group.
2. Creating a Compartment for Your Work.
3. (Optional) Creating a VCN and Subnet.
Before You Begin
To perform this tutorial, you must have the following:
-
A paid Oracle Cloud Infrastructure (OCI) account, or a new account with Oracle Cloud promotions. See Request and Manage Free Oracle Cloud Promotions.
- Administrator privilege for the OCI account.
-
At least one user in your tenancy who wants to access the Data Science service. This user must be created in IAM.
1. Creating a Data Scientists User Group
Create a user group for the data scientists to work in.
2. Creating a Compartment for Your Work
Create a compartment for your data science resources.
- Open the navigation menu and click Identity & Security. Under Identity, click Compartments.
- Click Create Compartment.
- Name the new compartment data-science-work, and enter a description.
- Click Create Compartment.
- Confirm that the compartment appears in the compartments list.
3. (Optional) Creating a VCN and Subnet
This step is optional. When you create a notebook session in Step 6. Creating a Notebook Session, you can choose to create a default network with the proper setup for notebook sessions.
You can skip creating a network and setting up subnets and gateways if you choose default networking when creating a notebook. If the default networking is configured in a notebook, you can't change it when reactivating the notebook.
This section shows users who require access to their VCNs, how to create a VCN and later, how to choose the recommended subnet for notebook sessions. For example, if you're performing the Scheduling Data Science Job Runs tutorial, you create this network and use it both for the notebook session in Data Science, and the workspace in the Data Integration service.
For egress access to the public internet, we recommend that you use a private subnet with a route to a NAT Gateway. A NAT gateway gives instances in a private subnet access to the internet. The VCN that you create in this step creates a private subnet with egress access to the internet through the VCN's NAT Gateway.
4. Creating Policies
Before users start their notebook sessions, you must configure the Data Science policies.
Explanation for the policies:
-
To allow the Data Science service to attach your VCN to your notebook session and route egress traffic from the notebook environment, add:
allow service datascience to use virtual-network-family in compartment data-science-work
-
To allow the
data-scientists
group to perform operations on all Data Science resources in thedata-science-work
compartment (projects, notebook sessions, models, model deployments, work requests, jobs, and job runs), add:allow group data-scientists to manage data-science-family in compartment data-science-work
-
To allow those data scientists to use the VCN, you created and attach it to their notebook session, add:
allow group data-scientists to use virtual-network-family in compartment data-science-work
-
To allow those data scientists to create and manage buckets, such as adding artifacts and conda environments to buckets, add:
allow group data-scientists to manage buckets in compartment data-science-work allow group data-scientists to manage objects in compartment data-science-work
Instead of specifying which resources to manage such as buckets, objects, or virtual network family, to allow data scientists administrative rights to their compartment, in which they can manage all the resources of OCI services, replace the preceding five policies with the following two policies:
allow group data-scientists to manage all-resources in compartment data-science-work
allow service datascience to use virtual-network-family in compartment data-science-work
5. Creating a Dynamic Group with Policies
Create a dynamic group for Data Science resources and allow this dynamic group to access other OCI resources, such as Object Storage and Logging.
To give permission to OCI resources to access other OCI resources, first, you add the resources to a dynamic group, instead of a user group. Then you write policies to allow the dynamic group to access specified resources. Here, your dynamic group has three Data Science resources: notebook sessions, model deployments, and job runs.
You can use this dynamic group to give notebook sessions and model deployments that
are in the data-science-work
compartment, access to other OCI
resources in the tenancy.
Explanation for the policies:
-
To allow notebook sessions to perform CRUD operations on entries in the model catalog, projects, and notebook session resources, add:
allow dynamic-group data-science-dynamic-group to manage data-science-family in compartment data-science-work
-
To allow notebook sessions to perform CRUD operations on Data Flow applications and runs, add:
allow dynamic-group data-science-dynamic-group to manage dataflow-family in compartment data-science-work
-
To allow notebook sessions to list and read compartments and user names that are in the tenancy, add:
allow dynamic-group data-science-dynamic-group to read compartments in tenancy allow dynamic-group data-science-dynamic-group to read users in tenancy
-
To allow model deployments to emit logs to the Logging service, add:
allow dynamic-group data-science-dynamic-group to use log-content in compartment data-science-work
-
To allow job runs to create logs and record job run details in the Logging service, add:
allow dynamic-group data-science-dynamic-group to use log-groups in compartment data-science-work
-
To allow notebook sessions and model deployments to read and write files to object storage buckets, in the
data-science-work
compartment, add:allow dynamic-group data-science-dynamic-group to manage object-family in compartment data-science-work
- The preceding policy allows model deployments to access any bucket in the data-science-work compartment.
- To give model deployments read access to specific buckets outside the data-science-work compartment, specify the bucket names and their compartments in your policy.
- Example: To allow model deployments to access published conda environments
from bucket
published-conda-env
, and model artifacts from bucketmodel-artifacts
, add:allow dynamic-group data-science-dynamic-group to read objects in compartment <another-compartment> where ANY {target.bucket.name='published-conda-envs', target.bucket.name='model-artifacts'}
- If your policy statements mention tenancy or include compartments
outside the
data-science-work
compartment, then in the Create Policy dialog, for the Compartment option, select <your-tenancy> (root). This way, in addition to your compartment, the policy can include rules for other compartments in the tenancy.
6. Creating a Notebook Session
Lastly, create a notebook session and test its access to the public internet.
What's Next
You have successfully set up a Data Science tenancy and created a Data Science project that includes a notebook session. You can now proceed to the following tasks: