To help troubleshoot Autoscaling issues, see, Debugging a Model Deployment Failure.
The autoscaling operation uses the update work request type, hence, in the event of an error,
scrutinize the failed update work requests. Any occurring errors are presented in
Error Messages. Review the specifics of the error and take
appropriate action.
Service Timed Out during Operation
When creating, updating, or activating model deployment with an autoscaling type
scaling policy, the operation fails with a Service Timed Out error.
The system checks for the presence of an IAM
policy in the customer tenancy for the metrics retrieval by autoscaling service.
A missing policy might lead to the service timing out.
When deploying a model with autoscaling using a custom metric MQL query, the
operation fails with the error:
Failed to provision compute resources because of an invalid parameter in the
request. Invalid TQL query.
An incorrect or invalid query.
Check the work request error messages. Correct the query and test it
in the Metrics Explorer
before creating or updating the model deployment using those queries.
Scaling isn't Triggered or Takes too Long to Complete 🔗
Scaling isn't triggered or takes too long complete.
The model deployment scaling operation operates on a best-effort basis. Following
the creation of the model deployment with autoscaling enabled, a cool-down
period is enforced. Autoscaling triggers only after the cool-down time has
finished.
Note
This cool-down period is applied after creation or update, and
between each scaling event. This cool-down period also resets after every
user-performed update operation.
When the scaling operation has
started on the model deployment, the completion of the process might take up to
six minutes on average. The duration of the scaling time is predominantly
influenced by the size of the artifact. If the artifact size is large, the model
deployment might require more time to bootstrap, so extending the overall
duration of the scaling operation.
The model deployment might require more time to bootstrap, so
extending the overall duration of the scaling operation.