Troubleshooting Alarms
Use troubleshooting information to identify and address common issues that can occur while working with alarms in Monitoring.
Alarm Fires and Clears Continually
Troubleshoot an alarm that keeps switching between Firing
and OK
status values.
Either the alarm interval is too small or the trigger delay is too large (or both). The resource emits the specified metric at a greater frequency than the alarm interval.
For example, consider the metric DatabaseAvailability
, which is emitted every 5 minutes.
API request (relevant portions):
"isNotificationsPerMetricDimensionEnabled":false,
"namespace":"oci_autonomous_database",
"query":"DatabaseAvailability[1m].absent()",
"pendingDuration":"PT3M",
Console configuration:
Field | Value |
---|---|
Metric namespace | oci_autonomous_database |
Metric name | DatabaseAvailability |
Interval | 1 minute |
Statistic | Mean |
Trigger rule |
|
Message grouping | Group notifications across metric streams |
- Example: Alarm Switches Status
Following is an example of an alarm's status switching between Firing
and OK
status values from 1:00 to 1:08. Note the OK
status at 1:01, 1:02, 1:06, and 1:07. At these times, the alarm evaluation results met the condition for the one-minute interval, but the status change was internally pending because of the three-minute trigger delay. The alarm status changed to Firing
at 1:03 and 1:08 because three consecutive evaluations met the condition.
Time | Value in metric chart* | Alarm condition met? | Alarm status |
---|---|---|---|
1:00 | 0 |
No | OK |
1:01 | 1 |
Yes. Status change is internally pending | OK |
1:02 | 1 |
Yes. Status change is internally pending | OK |
1:03 | 1 |
Yes | Firing |
1:04 | 1 |
Yes | Firing |
1:05 | 0 |
No | OK |
1:06 | 1 |
Yes. Status change is internally pending | OK |
1:07 | 1 |
Yes. Status change is internally pending | OK |
1:08 | 1 |
Yes | Firing |
*For value in metric chart, 0
means the metric is present while 1
means the metric is absent. For an example metric chart, see Creating an Absence Alarm.
To remedy this situation, update the following alarm configuration:
- Alarm interval to be equal to or greater than the frequency of the metric emission. See Selecting the Interval for an Alarm Query.
- Trigger delay to accommodate latency. See Defining the Trigger Delay for an Alarm.
For example, update the interval to 10 minutes and update the trigger delay to 1 minute.
API request (relevant portions):
"isNotificationsPerMetricDimensionEnabled":false,
"namespace":"oci_autonomous_database",
"query":"DatabaseAvailability[10m].absent()",
"pendingDuration":"PT1M",
Console configuration:
Field | Value |
---|---|
Metric namespace | oci_autonomous_database |
Metric name | DatabaseAvailability |
Interval | 10 minutes |
Statistic | Mean |
Trigger rule |
|
Message grouping | Group notifications across metric streams |
- Example: Metric is Present, Alarm is
OK
- In this example, the metric is present at the expected times (every five minutes): 2:00, 2:05, and 2:10. At each time, the alarm evaluates for presence of the metric during the last ten minutes. The alarm's status remains
OK
for the listed times.
- Example: Metric is Absent, Alarm is
Firing
- In this example, the metric is present at 2:00, but absent at 2:05 and 2:10. Because the alarm interval is ten minutes, the alarm condition wasn't met at 2:05. At 2:10 the alarm changes to
Firing
status because the alarm condition is met (zero metrics were present for the ten-minute interval).