Metric Details in Generative AI
You can monitor OCI Generative AI resources through the metrics provided in this service. You can also use the OCI Monitoring service to create custom queries and alarms to notify you when these metrics meet alarm-specified triggers.
Hosting Dedicated AI Cluster Metrics
This section lists the metrics for the hosting dedicated AI clusters. The fine-tuning dedicated clusters don't display metrics.
Metric Display Name | Description |
---|---|
Utilization | The available capacity for a dedicated AI cluster displayed as percentage over time |
Total number of input | Number of input tokens that the models on this hosting dedicated AI cluster have processed |
Total number of output | Number of output tokens that the models on this hosting dedicated AI cluster have processed |
You can get the preceding metrics from a hosting dedicated AI cluster's detail page.
Endpoint Metrics
This section lists the metrics for model endpoints in Generative AI.
Metric Display Name | Description |
---|---|
Total processing time | Total processing time for a call to finish |
Number of calls | Number of input tokens that the model that's hosted on this endpoint has processed |
Service Errors Count | Number of calls with a service internal error |
Client Errors Count | Number of calls with a client side error |
Total number of input | Number of input tokens that the model that's hosted on this endpoint has processed |
Total number of output | Number of output tokens that the model that's hosted on this endpoint has processed |
Success rate of calls | Successful calls divided by the total number of calls |
You can get the preceding metrics from an endpoint's detail page.
Metrics for Custom Queries
You can create custom queries and alarms for the Generative AI cluster and endpoint metrics through the Monitoring service.
This section lists the parameters that you can use to create custom queries for Generative AI metrics by using the Monitoring service.
Metric Parameter | Display Name | Description |
---|---|---|
ClientErrorCount |
Client Errors Count | Number of calls with a client side error |
InputTokenCount |
Total number of input | Number of input tokens that the models hosted on this resource have processed |
InvocationLatency |
Total processing time | Total processing time for a call to finish on this resource |
OutputTokenCount |
Total number of output | Number of output tokens that the models hosted on this resource have processed |
ServerErrorCount |
Service Errors Count | Number of calls with a service internal error |
TotalInvocationCount |
Number of calls | Number of calls |
For the steps on how to create these custom queries, see Creating a Query for Generative AI Metrics.