Compute Instance Health Metrics

You can monitor the health, capacity, and performance of your compute virtual machine (VM) instances by using metrics, alarms, and notifications.

This topic describes the metrics emitted by the metric namespace oci_compute_instance_health.

Resources: Compute VM instances.

Overview of Metrics: oci_compute_instance_health

The following compute instance health metric helps you monitor the status, health, and accessibility of compute instances.

Instance accessibility status: The instance_accessibility_status metric lets you monitor whether a VM instance is unresponsive. Compute sends an Address Resolution Protocol (ARP) request to the instance's virtual network interface card (VNIC). If the ARP ping fails, the metric shows that the instance is unresponsive.

Note

The instance_accessibility_status metric doesn't determine or report the specific reason for the instance's unresponsiveness. The ARP test provides no insight into the possible issues with the instance's OS.

To troubleshoot an unresponsive VM instance:

  1. Check the infrastructure health metrics to determine whether there is an ongoing infrastructure issue. If there is an ongoing infrastructure issue, then wait until Oracle Cloud Infrastructure resolves the issue, and then check the instance_accessibility_status metric again.
  2. If there isn't an ongoing infrastructure issue, then the instance probably has a software issue or a network misconfiguration that you must resolve yourself. Confirm that the OS and network are configured correctly. See the Compute troubleshooting suggestions and Networking troubleshooting suggestions.
  3. If the Compute and Networking troubleshooting steps aren't successful, then you can use a diagnostic reboot to rebuild an unreachable instance.

Required IAM Policy

To monitor resources, you must be granted the required type of access in a policy  written by an administrator, whether you're using the Console or the REST API with an SDK, CLI, or other tool. The policy must give you access to the monitoring services as well as the resources being monitored. If you try to perform an action and get a message that you don't have permission or are unauthorized, contact the administrator to find out what type of access you were granted and which compartment  you need to work in. For more information about user authorizations for monitoring, see IAM Policies.

Available Metrics: oci_compute_instance_health

The metric listed in the following table is automatically available for your instances. You do not need to enable monitoring on the instance to get these metrics.

You also can use the Monitoring service to create custom queries.

The metric includes the following dimensions :

resourceDisplayName
The friendly name of the instance.
resourceId
The OCID  of the instance.
Metric Metric Display Name Unit Description Dimensions
instance_accessibility_status Instance accessibility status Count The accessibility status of a VM instance. A value of 1 indicates that the instance is unresponsive due to an issue with the infrastructure or the instance itself. A value of 0 indicates that an accessibility issue has not been detected. If the instance is stopped, then the metric does not have a value.

resourceDisplayName

resourceId

Using the Console

To view compute health metrics for a single instance
  1. Open the navigation menu and click Compute. Under Compute, click Instances.
  2. Click the instance that you're interested in.
  3. Under Resources, click Metrics.
  4. In the Metric namespace list, select oci_compute_instance_health.

    The Metrics page displays a default set of charts for the current instance.

For more information about monitoring metrics and using alarms, see Overview of Monitoring. For information about notifications for alarms, see Overview of Notifications.

To view compute health metrics for all instances in a compartment
  1. Open the navigation menu and click Observability & Management. Under Monitoring, click Service Metrics.
  2. Select a compartment.
  3. For Metric namespace, select oci_compute_instance_health.

    The Service Metrics page dynamically updates to show charts for each metric that is emitted by the selected metric namespace.

For more information about monitoring metrics and using alarms, see Overview of Monitoring. For information about notifications for alarms, see Overview of Notifications.