By using metrics, you can monitor the endpoints in Generative AI Agents. Review the following topics for more information about these metrics.
Endpoint Metrics
This section lists the metrics for agent endpoints in Generative AI Agents. You can get the following metrics in an endpoint's detail page.
Metric Display Name
Description
Number of calls
Number of calls that the agent that's hosted on this endpoint has processed
Total processing time (ms)
Total processing time for a call to finish in milliseconds
Service errors count
Number of calls with an error from the service side
Client errors count
Number of calls with an error from the client side
Total input characters consumed
Number of input characters that the agent that's hosted on this endpoint has processed
Total output characters produced
Number of output characters that the agent that's hosted on this endpoint has processed
Number of error traces
Number of traces with an error (This option applies if tracing is enabled for this endpoint.)
Success rate
Successful calls divided by the total number of calls
Tip
In Generative AI Agents service, an endpoint's detail page, select the Options menu in each of the endpoint metric charts to get the following options:
View Query in Metrics Explorer
Copy chart URL
Copy query in Monitoring Query Language (MQL)
Create an alarm on this query
Table View
Viewing Query in Metrics Explorer 🔗
The metrics explorer is a resource in the Monitoring service. To get permission to work with the Monitoring service resources, ask an administrator to review the IAM policies in Securing Monitoring and grant you the proper access for your role.
For each of the endpoint metrics, select the Options menu in each of the endpoint metric charts and then click View Query in Metrics Explorer The following table displays the parameters used for the endpoint metrics in Monitoring Query Language (MQL).
Metric Display Name
Metric Parameter
MQL
Number of calls
TotalInvocationCount
TotalInvocationCount[1m].count()
Total processing time
InvocationLatency
InvocationLatency[1m].mean()
Service errors count
ServerErrorCount
ServerErrorCount[1m].count()
Client errors count
ClientErrorCount
ClientErrorCount[1m].count()
Total input characters consumed
InputCharactersCount
InputCharactersCount[1m].sum()
Total output characters produced
OutputCharactersCount[1m].sum()
OutputCharactersCount[1m].sum()
Number of error traces
ErrorTraceCount
ErrorTraceCount[1m].sum()
The success rate is calculated as successful calls divided by the total number of calls with the following MQL:
For each of the endpoint metrics, select the Options menu in each of the endpoint metric charts and then click Create an alarm on this query to be transported to a populated Create alarm page in the Monitoring service. Fill in the remaining fields to set an alarm for the metric that you selected.