Troubleshooting Alarms

Use troubleshooting information to identify and address common issues that can occur while working with alarms in Monitoring.

Alarm Fires and Clears Continually

Troubleshoot an alarm that keeps switching between Firing and OK status values.

Either the alarm interval is too small or the trigger delay is too large (or both). The resource emits the specified metric at a greater frequency than the alarm interval.

For example, consider the metric DatabaseAvailability, which is emitted every 5 minutes.

API request (relevant portions):

  "isNotificationsPerMetricDimensionEnabled":false,
  "namespace":"oci_autonomous_database",
  "query":"DatabaseAvailability[1m].absent()",
  "pendingDuration":"PT3M",

Console configuration:

Field Value
Metric namespace oci_autonomous_database
Metric name DatabaseAvailability
Interval 1 minute
Statistic Mean
Trigger rule
  • Operator: absent
  • Trigger delay minutes: 3
Message grouping Group notifications across metric streams
Example: Alarm Switches Status

Following is an example of an alarm's status switching between Firing and OK status values from 1:00 to 1:08. Note the OK status at 1:01, 1:02, 1:06, and 1:07. At these times, the alarm evaluation results met the condition for the one-minute interval, but the status change was internally pending because of the three-minute trigger delay. The alarm status changed to Firing at 1:03 and 1:08 because three consecutive evaluations met the condition.

Time Value in metric chart* Alarm condition met? Alarm status
1:00 0 No OK
1:01 1 Yes. Status change is internally pending OK
1:02 1 Yes. Status change is internally pending OK
1:03 1 Yes Firing
1:04 1 Yes Firing
1:05 0 No OK
1:06 1 Yes. Status change is internally pending OK
1:07 1 Yes. Status change is internally pending OK
1:08 1 Yes Firing

*For value in metric chart, 0 means the metric is present while 1 means the metric is absent. For an example metric chart, see Creating an Absence Alarm.

To remedy this situation, update the following alarm configuration:

For example, update the interval to 10 minutes and update the trigger delay to 1 minute.

API request (relevant portions):

  "isNotificationsPerMetricDimensionEnabled":false,
  "namespace":"oci_autonomous_database",
  "query":"DatabaseAvailability[10m].absent()",
  "pendingDuration":"PT1M",

Console configuration:

Field Value
Metric namespace oci_autonomous_database
Metric name DatabaseAvailability
Interval 10 minutes
Statistic Mean
Trigger rule
  • Operator: absent
  • Trigger delay minutes: 1
Message grouping Group notifications across metric streams
Example: Metric is Present, Alarm is OK
In this example, the metric is present at the expected times (every five minutes): 2:00, 2:05, and 2:10. At each time, the alarm evaluates for presence of the metric during the last ten minutes. The alarm's status remains OK for the listed times.
Time Value in metric chart* Alarm condition met? Alarm status
2:00 0 No OK
2:01 1 No OK
2:02 1 No OK
2:03 1 No OK
2:04 1 No OK
2:05 0 No OK
2:06 1 No OK
2:07 1 No OK
2:08 1 No OK
2:09 1 No OK
2:10 0 No OK
2:11 1 No OK
*For value in metric chart, 0 means the metric is present while 1 means the metric is absent. For an example metric chart, see Creating an Absence Alarm.
Example: Metric is Absent, Alarm is Firing
In this example, the metric is present at 2:00, but absent at 2:05 and 2:10. Because the alarm interval is ten minutes, the alarm condition wasn't met at 2:05. At 2:10 the alarm changes to Firing status because the alarm condition is met (zero metrics were present for the ten-minute interval).
Time Value in metric chart* Alarm condition met? Alarm status
2:00 0 No OK
2:01 1 No OK
2:02 1 No OK
2:03 1 No OK
2:04 1 No OK
2:05 1 No OK
2:06 1 No OK
2:07 1 No OK
2:08 1 No OK
2:09 1 No OK
2:10 1 Yes Firing
2:11 1 Yes Firing
*For value in metric chart, 0 means the metric is present while 1 means the metric is absent. For an example metric chart, see Creating an Absence Alarm.