Model Deployment

Follow these steps to deploy models with AI Quick Actions.

Model Deployment Creation

You can create a Model Deployment from the foundation models with the tag Ready to Deploy in the Model Explorer, or with fine tuned models. When you create a Model Deployment in AI Quick Actions, you're creating an OCI Data Science Model Deployment, which is a managed resource in the OCI Data Science Service. You can deploy the model as HTTP endpoints in OCI.

You need to have the necessary policy to use Data Science Model Deployment. You can select the compute shape for the model deployment. You can set up logging to monitor your model deployment. Logging is optional but it's highly recommended to help troubleshoot errors with your Model Deployment. You need to have the necessary policy to enable logging, see Model Deployment Logs for more information on logs. Under advanced option, you can select the number of instances to deploy and the Load Balancer bandwidth.

See Model Deployment on GitHub for more information about, and tips on, deploying models.

    1. Navigate to the Model Explorer.
    2. Select the model card for the model you want to deploy.
    3. Select Deploy to deploy the model.
      The Deploy model page is displayed.
      1. Give the deployment a name.
      2. Select a compute shape.
      3. Optional: Select a log group.
      4. Optional: Select a predict and access log.
      5. Optional: Under Inference mode select a completion endpoint from the default value /v1/completions or the value /v1/chat/completions.
        Note

        To use image payload for a multimodal model, you must select the Inference mode /v1/chat/completions.
      6. Optional: Select a private endpoint.
        Note

        A private endpoint must be created as a prerequisite for the model deployment resource.

        The private endpoint feature for model deployment is only enabled in the OC1 realm. For other realms, create a service request for Data Science.

      7. Select Show advanced options.
      8. Update the instance count, and update the Load Balancer bandwidth.
      9. Optional: Under Inference container select an inference container.
      10. Select Deploy.
    4. Under AI Quick Actions, select Deployments.
      The list of model deployments is shown. For the deployment created in step 3, wait for Lifecycle state to become Active before clicking it to use it.
    5. Scroll to display the Inference Window.
    6. Enter text in Prompt to test the model.
    7. (Optional) Adjust the model parameters as appropriate.
    8. Click Generate.
      The output is displayed in Response.
  • For a complete list of parameters and values for AI Quick Actions CLI commands, see AI Quick Actions CLI.

  • This task can't be performed using the API.

Invoke Model Deployment in AI Quick Actions

You can invoke model deployment in AI Quick Actions from the CLI or Python SDK.

For more information, see the section on model deployment tips in GitHub.

Model Artifacts

Where to find model artifacts.

When a model is downloaded into a Model Deployment instance, it's downloaded in the /opt/ds/model/deployed_model/<object_storage_folder_name_and_path> folder.