Create a model deployment with the default networking option.
The workload is attached by using a secondary VNIC to a preconfigured, service-managed VCN,
and subnet. This provided subnet gives access to other Oracle Cloud services through a service
gateway but not to the public internet.
If you need access only to OCI services, we
recommend using this option. It doesn't require you to create networking resources or write
policies for networking permissions.
You can create and run default networking model deployments using the Console, the OCI
Python SDK, the OCI CLI, or the Data Science API.
From the model deployments page, select Create model
deployment. If you need help finding the list of model
deployments, see Listing Model Deployments.
(Optional)
Enter a unique name for the model (limit of 255 characters). If you don't provide a name, a name is automatically generated.
For example, modeldeployment20200108222435.
(Optional)
Enter a description (limit of 400 characters) for the model deployment.
(Optional)
Under Default configuration, enter a custom environment
variable key and corresponding value. Select + Additional custom
environment key to add more environment variables.
In the Models section, select
Select to select an active model to deploy from the
model catalog.
Find a model by using the default compartment and project, or by
selecting Using OCID and searching for the model
by entering its OCID.
Select the model.
Select Submit.
Important
Model artifacts that exceed 400 GB aren't supported for
deployment. Select a smaller model artifact for deployment.
(Optional)
Change the Compute shape by selecting
Change shape. Then, follow these steps in the
Select compute panel.
Select the shape that best suits how you want to use the resource. For
the AMD shape, you can use the default or set the number of OCPUs and
memory.
For each OCPU, select up to 64 GB of memory and a maximum total of
512 GB. The minimum amount of memory allowed is either 1 GB or a
value matching the number of OCPUs, whichever is greater.
Select Select shape.
Enter the number of instances for the model deployment to replicate the model on.
Select Default networking to configure the network
type.
Select one of the following options to configure the endpoint type:
Public endpoint: Data access in a managed instance from
outside a VCN.
Private endpoint: The private endpoint that you want to
use for the model deployment.
If you selected Private endpoint, select Private
Endpoint from Private Endpoint in Data
Science.
Select Change compartment to
select the compartment containing the private endpoint.
For access logs, select a compartment, log group, and log name.
For predict logs, select a compartment, log group, and log name.
Select Submit.
(Optional)
Select Show Advanced Options to add tags.
(Optional)
Select the serving mode for the model deployment, either as an HTTPS
endpoint or using a Streaming service
stream.
(Optional)
Select the load balancing bandwidth in Mbps or use the 10 Mbps
default.
Tips for load balancing:
If you know the common payload size and the frequency of requests per second, you can use the following formula to estimate the bandwidth of the load balancer that you need. We recommend that you add an extra 20% to account for estimation errors and sporadic peak traffic.
(Payload size in KB) * (Estimated requests per second) * 8 / 1024
For example, if the payload is 1,024 KB and you estimate 120 requests per second, then the recommended load balancer bandwidth would be (1024 * 120 * 8 / 1024) * 1.2 = 1152 Mbps.
Remember that the maximum supported payload size is 10 MB when dealing with image payloads.
If the request payload size is more than the allocated bandwidth of the load balancer that was defined, then the request is rejected with a 429 status code.
(Optional)
Select Use a custom container image and enter
the following:
Repository in <tenancy>: The repository that contains the custom image.
Image: The custom image to use in the model deployment at runtime.
CMD: More commands to run when the container starts. Add one instruction per text-box. For example if CMD is ["--host", "0.0.0.0"], then pass --host in one text-box and 0.0.0.0 in another one. Don't use quotation marks at the end.
Entrypoint: One or more entry point files to run when the container starts. For example /opt/script/entrypoint.sh. Don't use quotation marks at the end.
Server port: The port that the web server serving the inference is running on. The default is 8080. The port can be anything between 1024 and 65535. Don't use the 24224, 8446, 8447 ports.
Health check port: The port that the container HEALTHCHECK listens on. Defaults to the server port. The port can be anything between 1024 and 65535. Don't use the 24224, 8446, 8447 ports.
(Optional)
Select the Tags tab, and then enter the tag
namespace (for a defined tag), key, and value to assign tags to the
resource.
To add more than one tag, select Add tag.
Tagging describes the
various tags that you can use organize and find resources including
cost-tracking tags.
Select Create.
You can use the OCI CLI to create a model deployment as in
this example.