Oracle Cloud Infrastructure (OCI) Data Science is a fully managed and serverless platform for data
science teams to build, train, and manage machine learning models.
The Data Science service:
Provides data scientists with a collaborative, project-driven workspace.
Enables self-service, serverless access to infrastructure for data science workloads.
Includes Python-centric tools, libraries, and packages developed by the open source
community and the Oracle Accelerated Data Science Library, which supports the
end-to-end lifecycle of predictive models:
Data acquisition, profiling, preparation, and visualization.
Feature engineering.
Model training (including Oracle AutoML).
Model evaluation, explanation, and interpretation (including Oracle MLX).
Integrates with the rest of the Oracle Cloud Infrastructure stack,
including Functions, Data Flow, Autonomous Data Warehouse, and Object Storage.
Model deployment as resources to deploy models as web applications (HTTP API
endpoints).
Data Science jobs enable you to define and run repeatable machine learning tasks on a fully-managed infrastructure.
Pipelines enable you to run end-to-end machine learning workflows.
Includes policies, and vaults to control access to compartments and resources.
Includes metrics that provide insight into the health, availability, performance, and
usage of your Data Science resources.
Helps data scientists concentrate on method and domain expertise to deliver models to
production.
The Oracle Accelerated Data Science (ADS) SDK is a Python library that's included as part of the OCI Data Science service. ADS has many functions and objects that automate or simplify the steps in the Data Science workflow, including connecting to data, exploring and visualizing data, training a model with AutoML, evaluating models, and explaining models. In addition, ADS provides an interface to access the Data Science service model catalog and other OCI services including Object Storage. To familiarize yourself with ADS, see the Accelerated Data Science Library.
Data Science notebook sessions are interactive coding environments for building and training models. Notebook sessions come with many preinstalled open source and Oracle developed machine learning and data science packages.
Conda is an open source environment and package management system and was created for Python programs. It installs, runs, and updates packages and their dependencies. Conda easily creates, saves, loads, and switches between environments on your local computer.
Model deployments are a managed resource in the Data Science service that allows you to deploy models stored in the model catalog as HTTP endpoints. Deploying machine learning models as web applications (HTTP API endpoints) serving predictions in real time is the most common way to productionized models. HTTP endpoints are flexible and can serve requests for model predictions.
OCI provides SDKs that interact with Data Science
without the need to create a framework.
The CLI provides both quick access and full
functionality without the need for programming.
Regions and Availability Domains 🔗
OCI services are hosted in regions and availability domains. A region is a localized geographic area, and an availability domain is one or more data centers found in that
region.
Data Science is hosted in all regions where OCI is available.
Limits on Data Science Resources 🔗
When you sign up for OCI, a set of service limits is configured for your tenancy. The service limit is the quota or allowance set on the resources.
Failed and inactive notebook sessions and models count against your service limits. Only
when you fully stop an instance or delete a model is it not counted toward your quota.
GPU limits are set to zero by default so ask your system administrator to increase the
limits so that you can use GPUs.
The maximum number of jobs is 1000. By default, every tenancy can create up to 1000 jobs.
You can increase this limit a CAM service request ticket.
The number of simultaneous job runs is limited by your Data Science core count limits.
Resource Identifiers 🔗
Most types of OCI resources have an Oracle assigned unique ID called an OCID (Oracle Cloud Identifier) .
The OCID is included as part of the resource's information in both the Console and API. For information about the OCID format and other ways to identify resources, see Resource Identifiers.
Authentication and Authorization 🔗
Each service in OCI integrates with Identity and Access Management for access to cloud
resources through all interfaces (the OCI Console, SDKs, REST APIs, or the CLI).
An administrator in your organization must set up tenancies, groups, compartments, and
policies that control who can access which services and resources and the type of
access. Your administrator confirms which compartments you should be using.
Use Policies to create and
manage Data Science projects, or launch notebook
sessions.
Provisioning and Pricing 🔗
The Data Science service offers a serverless experience for
model development and deployment. When you create Data Science
resources, such as notebook sessions, models, model deployments, jobs, and the underlying Compute
and storage infrastructure is provisioned and maintained for you.
You pay for the use of the underlying infrastructure (Block Storage, Compute, and Object
Storage). Review the detailed pricing list for Data Science
resources.
You only pay for the infrastructure while you're using it with Data Science resources:
Notebook Sessions
Notebook sessions are serverless, and all underlying infrastructure is
service-managed.
When creating a notebook session, you select the VM shape (the type of machine CPU or GPU,
and the number of OCPU or GPUs) and amount of block storage (minimum of 50 GB).
While a notebook session is active, you pay for Compute and Block Storage at the standard
Oracle Cloud Infrastructure rates, see Deactivating Notebook Sessions.
You can deactivate the notebook session, which shuts down the Compute though retains the
Block Storage. In this case, you're no longer charged for Compute, but you continue to pay
for the Block Storage. This applies to notebook sessions with a GPU instance. Notebook
sessions with a GPU instance aren't metered for Compute when they're deactivated.
When you delete a notebook session, you're no longer charged for Compute or Block Storage,
see Deleting a Notebook Session.
Models
When you save a model to the model catalog, you're charged for the storage of the model
artifact at the standard Object Storage rates in terms of GB per month.
When you delete a model, you're no longer charged, see Deleting a Model.
Model Deployments
When you deploy a model, you select the shape type and the number of replicas hosting the
model servers. You can also select the load balancer bandwidth associated with the
deployment.
When a model deployment is active, you pay for the VMs that are hosting the model servers
and the load balancer at the standard OCI
rates.
When you deactivate a model deployment, you're no longer charged for the VMs or the load
balancer. You can reactivate a model deployment and billing resumes for both VMs and the
load balancer.
When you delete a model deployment, you're no longer charged for the infrastructure
associated with the model deployment.
Jobs
Jobs don't render a premium cost for using the service, you only pay for the underlining
used infrastructure and only during the duration of execution of the job artifact.
Metering starts from the moment the job artifact is run, and stops with the code exit. You
don't pay for the infrastructure provisioning time nor for the deprovisioning of the
infrastructure.
Metering includes the CPU or GPU consumption per OCPU during the duration of running the
job artifact and the Block Storage size used for the job.
Using the Logging service with Jobs doesn't incur an extray cost.
Pipelines
Pipelines are billed by the usage of the underlying Compute and Block Storage that the
pipeline uses to run the pipeline step code.
There is no extra charge for the orchestration or artifact storage.