Custom Scaling Metric Type to Configure Autoscaling
Use the custom metric type option to configure autoscaling.
Use the custom scaling metric option to use any of the Model Deployment Metrics emitted by the model deployment resource to create an MQL query, which can then be used to
configure autoscaling. This approach lets you create more sophisticated queries, such as
joining several queries using AND, and OR, using different aggregation functions, and
incorporating an evaluation window of choice. By using this option, you gain greater control
over the scaling conditions, enabling a more tailored and precise setup.
When formulating an MQL query, include {resourceId =
"MODEL_DEPLOYMENT_OCID"} in the query as shown in the examples provided. During the
processing of the request, the service replaces the placeholder
MODEL_DEPLOYMENT_OCID keyword with the actual resource OCID. This lets the
service retrieve the exact set of metrics associated with the resource.
Testing Custom Metric MQL Queries
Follow these steps to test and complete the queries.
Select the metric chart for the metric you want to use.
Select Options.
Navigate to View Query in MQL Explorer.
Select Edit Queries.
Select for Advanced Mode.
In the Query code editor, update and test the query for scaling-out and
scaling-in operations.
Use these tested queries to create model deployments with autoscaling
capabilities.
Example Queries 🔗
The following are sample queries for metrics you can use to enable autoscaling.
Note
These queries are provided for reference and can be customized based on the specific
use case. However, these queries can also be used without modification.
If no predict calls are made, then no metrics are emitted. In such cases, it
becomes necessary to incorporate the absent() function into the
alarm query. The following is an example query for scenarios where minimal or no
predict calls are made:
Use the provided metric and queries for scaling in response to predict request
volume.
If the total count of prediction requests to the specific model deployment
exceeds 100 within a one-minute time window and this condition persists for the
specified pending duration time, it triggers a scale-out operation.
Similarly, if the cumulative count is less than 5, or if there are no requests at
all, and this situation continues for the pending duration time, the condition
begins a scale-in operation.
Apply this metric and queries to help scaling based on predict request
latencies.
The query evaluates the 99th percentile of PredictLatency for a specific model
deployment over a 1-minute period. If this 99th percentile latency value exceeds
120 milliseconds and persists for the pending duration time, the condition is met,
triggering a scale-out operation.
Conversely, if the 99th percentile is less than 20 milliseconds for the pending
duration time, a scale-in operation is started.
Use this metric and queries to implement scaling based on predict response
success rate.
The MQL query evaluates the percentage of successful PredictResponses compared to
all PredictResponses within a 1-minute interval for a specific model
deployment.
If this percentage is less than 95 and persists for the pending duration time,
the condition triggers a scale-out operation. Conversely, if the percentage is
more than 95 for the pending duration time, the condition starts a scale-in
operation.
Creating a Model Deployment with Autoscaling Using a Custom Metric 🔗
Learn how to create a model deployment with an autoscaling policy using a custom
metric.
From the model deployments page, select Create model
deployment. If you need help finding the list of model
deployments, see Listing Model Deployments.