Service Mesh includes various observability features including metrics and
logging.
Metrics
By installing Service Mesh, you gain Observability features that collect telemetry
data throughout the mesh and your application. Both inbound and outbound traffic now
flow through Service Mesh proxies. Key operating statistics like latency, failures,
and requests are now collected.
What Metrics does Service Mesh Emit? 🔗
Service Mesh utilizes Envoy as the proxy technology. Envoy emits many statistics
depending on configuration. Generally the statistics fall into three categories:
Downstream: Downstream statistics relate to incoming
connections/requests that are coming into the proxy.
Upstream: Upstream statistics relate to outgoing connections/requests
that are made from the proxy.
Server: Server statistics describe how the Envoy instance is working.
Statistics like server uptime or amount of allocated memory are categorized
here.
For a list of metrics that Service Mesh proxies emit, refer to the following Envoy
statistics documentation:
Envoy exposes metrics through an admin /stats/prometheus endpoint.
This endpoint is accessible for users to scrape Envoy metrics to Prometheus
instances. Installing Prometheus along with scrape configuration and Grafana is all
that is required to get started with monitoring these crucial metrics. For setup
instructions, see (Add Application Monitoring and Graphing Support).
After completing the setup, Prometheus scrapes telemetry data emitted by Service Mesh
proxies. You can then access Grafana through the service external IP to query and
graph telemetry data collected in Prometheus.
As a starting point, consider monitoring the following service mesh metrics.
Service Mesh also adds specific tags to all the stats exposed by Envoy. This feature
allows you to filter metrics by various tags associated to mesh resources. Tags
include the following:
Mesh OCID
VirtualService OCID (if available for resource)
VirtualService Name (if available for resource)
Envoy Cluster Name (for cluster stats)
Deployment Type (either virtual_deployment or ingress_deployment)
Virtual Deployment Name (if deployment type is virtual_deployment)
Ingress Deployment Name (if deployment type is ingress_deployment)
In the example, the value aaa... is an abbreviation for the full
OCID value.
Naming Conventions 🔗
As part of proxy
configuration setup, Service Mesh internally generates names for various resources
with the following format. The names are used as part of the stat names. For
example, virtual service deployment has the following cluster name
generated:
Cluster stats associated to this deployment look like the following
when scraping the /stats/prometheus endpoint. The cluster_name is
added as a tag for the stat and removed from the stat name:
OCI Logging is activated on virtual deployments and ingress gateways after you
install a mesh. OCI Logging Service collects logs for later analysis. Service Mesh
provides two types of logs: error logs and traffic logs. These logs might be used to
generate log-based statistics or to debug 404 and 503 issues.