Model Deployments
Troubleshoot your model deployments.
Debugging a Model Deployment Failure
After creating a new deployment or updating an existing employment, you might see a failure. These steps show how to debug the issue:
- On your project's home page, click Model Deployments.
- Select the model deployment name or click the Actions menu for the model deployment and select View Details. Next, check the work requests
- Under Resources, select Work Request .
The work requests appear at the bottom of the page.
- On the Work Requests Information page, click Log Messages.
- If any failures occur in the creation steps, under Resources, select Error Messages.
- If the work request shows success, then review the OCI predict logs to identify any errors.
Logs are attached to the model deployment when it's created.
- If logs are attached, select the predict log name to see the log.
- Select Explore with Log Search.
- Change the filter time to increase the period.
Conda Environment Path isn't Accessible
Ensure that conda environment path is valid, and that you have configured the appropriate policy for a published conda environment. The conda environment path must remain valid and accessible throughout the lifecycle of the model deployment to ensure availability and proper functioning of the deployed model.
Error Occurred when Starting the Web Server
Enable the model deployment predict logs to help you debug the errors. Generally, this happens when your code has issues or is missing required dependencies.
Invoking a Model Deployment Failure
When a model deployment is in an active lifecycleState
, the predict endpoint
can be invoked. The prediction response can return a failure for many reasons. Use these
suggestions to try to resolve these errors:
-
Ensure that the input passed in the request is in a valid JSON format and matches the expected input by the model.
-
Review the attached access logs for errors.
-
Ensure that the user has the correct access rights.
-
Ensure that the
score.py
file doesn't contain errors. -
If predictions are returning different results (success, fail) each time the prediction is called for the same input, it's possible that the allocated resources aren't enough to server the model prediction. You can edit the load balancer bandwidth to increase it and the Compute core count to serve more requests in parallel.
Too Many Requests (Status 429)
If you're getting this error when calling the predict endpoint, it means the requests are getting throttled. Increase the load balancer bandwidth of the model deployment to address this error. You can estimate the bandwidth using the expected number of requests in seconds, and the combined size of the request and response payload per request.