Operations Insights Metrics

You can monitor for conditions where incoming data for any Operations Insights-enabled target has been delayed for last one or two days by using metrics, alarms, and notifications.

This topic covers the metrics emitted by the Operations Insights service.

Overview of Operations Insights Metrics

Operations Insights relies on a constant flow of data coming from a variety of sources such as Autonomous DBs and Enterprise Manager targets such as hosts and databases.

Required Policies

To monitor resources, you must be given the required type of access in a policy . The policy must give you access to the monitoring services as well as the resources being monitored. If you try to perform an action and get a message that you don’t have permission or are unauthorized, confirm with your administrator the type of access you've been granted and which compartment you should work in. For more information on user authorizations for monitoring, see the Authentication and Authorization section for the related service: Monitoring or Notifications.

For information on required Operations Insights policies, see Set Up Groups and Policies.

The following topics are covered:

Dimensions Common Across Operations Insights Metrics

Common Dimensions Across Metrics

The following table shows dimensions common across all metrics emitted by Operations Insights except WarehouseCpuUtilization.

Dimensions Description
resourceId Operations Insights ID for the target.
resourceDisplayName Display Name of the target.
resourceType Type of resource. For example: ADB-S, ATP-D, EXTERNAL-HOST, EXTERNAL-PDB, EXTERNAL-NONCDB
telemetrySourceType The source of the metric: CloudInfrastructure, EnterpriseManager, AgentService.
telemetrySourceIdentifier Depending on the telemetrySourceIdentifier, this field will contain one of the following:
  • For ManagementAgent - Management Agent OCID
  • For EnterpriseManager - Enterprise Manager Bridge OCID
  • For CloudInfrastructure - The source database (DBaaS) OCID
telemetrySourceEntityIdentifier Depending on the telemetrySourceEntityIdentifier, this field will contain one of the following:
  • For ManagementAgent - External Database ID
  • For EnterpriseManager - GUID for the Enterprise Manager target.
associatedOCIResourceId The ADW OCID. This will only be populated for Autonomous Database targets.
sourceMetricName Source Metric name for which the delay is being reported.

Metrics

All Operations Insights Metrics

The following table shows all metrics emitted by Operations Insights.

Metric Name Specific Dimensions Description
DataFlowDelayInHrs dataProcessingFrequencyInHrs - Frequency of data processing in hours.
Note

For additional dimensions for this metric, see Dimensions Common Across Operations Insights Metrics.

Number of hours ago at which the data was last processed for a given target and metric.

The DataFlowDelayInHrs metric lets you monitor for data flow interruptions for all enabled targets and lets you quickly and easily identify which sources are having problems.

WarehouseCpuUtilization

resourceId - Operations Insights ID for the warehouse

resourceDisplayName - Display Name of the Operations Insights warehouse.

CPU Utilization of the ADW provisioned for the Operations Insights warehouse in percentage.
DaysToReachHighUtilization
resourceMetric -
  • Databases use: CPU, MEMORY, STORAGE
  • Hosts use: CPU, LOGICAL_MEMORY, STORAGE
  • Exadata use: CPU, STORAGE, MEMORY, IOPS, THROUGHPUT

aggregateDataMeasure - Indicates what underlying aggregate measure is being used in the forecast. Currently this can be AVG or MAX

forecastModel: Indicates which forecast model is being used in the forecast. Currently this can be SEASONALITY_AWARE, LINEAR_REGRESSION, or AUTOML

exceededForecastWindow - Indicates whether the number of days returned is equivalent to the amount of days being forecasted. This should be used in the alarms, like so: DaysToReachHighUtilization[1D]{resourceMetric="STORAGE", resourceType="Exadata", exceededForecastWindow="false"}.grouping(telemetrySource,resourceId).mean() < 30

Note

For additional dimensions for this metric, see Dimensions Common Across Operations Insights Metrics.
Days to reach high utilization (above default setting of 75%) for a given resource type and resource metric.

To modify utilization thresholds from the default settings see: Changing Utilization Thresholds.

DaysToReachLowUtilization
resourceMetric -
  • Databases use: CPU, MEMORY, STORAGE
  • Hosts use: CPU, LOGICAL_MEMORY, STORAGE
  • Exadata use: CPU, STORAGE, MEMORY, IOPS, THROUGHPUT

aggregateDataMeasure - Indicates what underlying aggregate measure is being used in the forecast. Currently this can be AVG or MAX

forecastModel: Indicates which forecast model is being used in the forecast. Currently this can be SEASONALITY_AWARE, LINEAR_REGRESSION, or AUTOML

exceededForecastWindow - Indicates whether the number of days returned is equivalent to the amount of days being forecasted. This should be used in the alarms, like so: DaysToReachHighUtilization[1D]{resourceMetric="STORAGE", resourceType="Exadata", exceededForecastWindow="false"}.grouping(telemetrySource,resourceId).mean() < 30

Note

For additional dimensions for this metric, see Dimensions Common Across Operations Insights Metrics.
Days to reach low utilization (below default setting of 25%) for a given resource type and resource metric.

To modify utilization thresholds from the default settings see: Changing Utilization Thresholds.

SQL Related Metric

The NumSqlsNeedingAttention metric assists you with SQL tuning and performance by allowing you to set alarms notifying you when SQL statements require attention.

Note

See Specific Alarm Conditions (SQL Alarms) for examples of setting up alarms under various conditions.

The following table shows the metric related to SQL alarms.

Metric Name Specific Dimensions Description
NumSqlsNeedingAttention

isDegraded (0,1) - set to 1 if response time percent change > 20% over the last 24 hours

isVariant (0,1) - set to 1 if SQL variability is > 1.66 over the last 24 hours

isInefficient (0,1) - set to 1 set if inefficiency > 20% over the last 24 hours

isPlanChanged (0,1) - set to 1 if the SQL plan has changed over the interval

isIncreasingIo (0,1) - set to 1 if IO increase > 50% over the last 24 hours

isIncreasingCpu (0,1) - set to 1 if CPU increase > 50% over the last 24 hours

isIncreasingWait (0,1) - set to 1 if Wait increase > 50% over the last 24 hours

Note

For additional dimensions for this metric, see Dimensions Common Across Operations Insights Metrics.
  • Emits the total number > 0 of unique SQL statements executed by an Operations Insights database resource that have possible performance conditions, e.g., performance is degrading beyond an Operations Insights determined threshold over the past 24 hours or a plan has changed. If no SQL statements meet the criteria, nothing is emitted. i.e., the metric is generated only when one or more SQLs require attention.
  • There are seven possible conditions that can be alerted on. These are represented by the is<trend> dimension flags, such as isDegraded, isVariant, or isInefficient. If a dimension flag is set to 1, it means that all the SQL statements represented by the metric have that condition. For example, if a metric reports the number 6, with isDegraded=1, isPlanChanged=1, it means that 6 SQL statements were observed to have both of these conditions over the last 24 hours. If, over the same interval, there were 3 SQL statements with only isPlanChanged=1, the system would emit a separate metric with a value of 3 and the dimension set to isPlanChanged=1. These values are mutually exclusive, so you can assume that the same SQL will not appear in both metrics. The ability to combine is<trend>

    dimension flags allows you to easily create alarms for specific combinations of conditions by simply specifying a dimension filter for every condition of interest.

Data Flow Metric

Operations Insights consumes data coming from different types of sources such as Autonomous databases , Enterprise Manager targets (databases, hosts, Exadata, etc.) and Management Agent targets (external databases, hosts, etc). The data gap metric allows you to set up alarms in the event data from these sources has stopped for the last 1 or 2 days .

Note

For examples on setting up alarms for the data flow metric, see Specific Alarm Conditions (Data Flow Delays).
Metric Name Dimensions Description
DataFlowDelayInHrs

sourceIdentifier - This will be Enterprise Manager Bridge Id for Enterprise Manager target, agent Id for agent based target and OCID of ADW for Autonomous Database targets.

sourceEntityIdentifier - This will be the Enterprise Manager target GUID for Enterprise Manager target, Cloud Infrastructure database Id for Management Agent based targets.

associatedResourceId - This will only be populated for Autonomous Database targets and it will be the OCID of the Autonomous Database.

dataProcessingFrequencyInHrs - Frequency of data processing in hours

Note

For additional dimensions for this metric, see Dimensions Common Across Operations Insights Metrics.
Number of hours ago at which the data was last processed for given target and metric

Data Flow Metric Examples

The following table shows the possible values of the dataProcessingFrequencyInHrs dimension for different resource types.

dataProcessingFrequencyInHrs Value Resource Example telemetrySourceType Description
1.00 Enterprise Manager managed DB EnterpriseManager Loads every hour to process performance metric data accumulated in the Object Storage bucket for Enterprise Manager managed DB targets.
3.00

Autonomous DB

Database Cloud Service DB

Enterprise Manager managed DB

Enterprise Manager managed host

Exadata Cell

Cloud Infrastructure DB

CloudInfrastructure

EnterpriseManager

AgentService

Loads every 3 hrs to get hourly performance metric data from the Monitoring service (for Cloud Infrastructure and Autonomous DBs) , Object Storage bucket (for Enterprise Manager managed targets) or for generating hourly rollups from the raw data ingested via ingestion APIs.

12.00

Autonomous DB

Database Cloud Service DB

Enterprise Manager managed DB

Cloud Infrastructure DB

CloudInfrastructure

EnterpriseManager

AgentService

Every 12 hours, load daily performance metrics data by reading raw data from the Operations Insights data store.
24.00

Enterprise Manager managed DB

Enterprise Manager managed host

Exadata Cell

EnterpriseManager

Every 24 hrs, 2 ETLs are run to process data for Enterprise Manager managed targets.

One ETL to load daily performance metrics data from the Object Storage bucket for Enterprise Manager managed targets.

Another ETL to load configuration metrics data from the object storage bucket for Enterprise Manager managed targets.

Oracle Database Cloud (DBCS) Metric

The MetricCollectionErrors metric number of collection errors for given target and metric

Metric Name Specific Dimensions Description
MetricCollectionErrors

associatedResourceId - This will be the DBAAS OCID for the resource.

sourceMetricName - Name of the metric collection which is failing.

This dimension can be one of the following:

  • ASHSqlStats
  • ASHSqlTexts
  • ASHSqlPlans
  • cpu_usage
  • memory_usage
  • storage_usage
  • tablespace_usage
  • db_external_properties
  • db_external_instance
  • db_os_config_instance

ErrorCategory - DatabaseConnection/QueryExecution

Cause - <actual ORA error code if available> or NA

e.g., ORA-12850

Note

For additional dimensions for this metric, see Dimensions Common Across Operations Insights Metrics.
Number of collection errors for given target and metric.

Create Alarms

Setting Alarms

When a metric condition is met, you can use the Monitoring service's alarm system to alert interested parties to conditions. You can create alarms on individual resources or on an entire compartment.

Operation Insights provides convenient access to Monitoring service's alarm creation functionality directly from any fleet resource page.

To create an alarm:
  1. From the left pane, click Administration.
  2. Click on a fleet resource. (Database Fleet, Host Fleet, Exadata Fleet, Operations Insights Warehouse).
  3. Click on the Action menu (vertical ellipses) for a specific resource and select Add Alarms. The Add Alarms to Metrics region displays. Expand the description region below each metric to view suggested trigger parameters as well as key dimensions.
    Graphic shows the Add Alarms to Metrics region.

  4. Click Add Alarm. You'll be taken to the Monitoring service Create Alarm page with the required metric details already populated.
    Note

    By default, an alarm applies to an individual resource. If you want the alarm to apply to an entire compartment, remove the resourceID.
  5. Under Notification>Destinations, Select a topic or channel that you want to use for sending notifications when an alarm is triggered. Alternatively, you can create a topic.
  6. Provide an alarm name and set the suggested threshold and trigger delay.
  7. Click Save alarm.

Specific Alarm Conditions

SQL Alarms

You can create alerts to conditions defined for the NumSqlsNeedingAttention metric. Alarms need to be created in a specific way in order for them to clear properly. The following examples illustrate how to trigger an alarm under various alert conditions.

Alarm Condition MQL Alarm Definition
You want to trigger an alarm if the total number of SQL statements across all resources, which are both degraded and have a plan change, is greater than 5.
NumSqlsNeedingAttention[3h]
{isIncreasingCpu="1", isDegraded="1"}.absent()==0 && NumSqlsNeedingAttention[3h]{isIncreasingCpu="1", isDegraded="1"}
.sum() > 5
You want to trigger an alarm whenever any resource has a plan change.
NumSqlsNeedingAttention[3h]
{isPlanChanged = "1"}.absent()==0 && NumSqlsNeedingAttention[3h]{isPlanChanged = "1"}
.max() > 0
You want to trigger an alarm whenever resource has a plan change.
NumSqlsNeedingAttention[3h]
{isPlanChanged = "1", resourceId = "opsi.ocid"}
.absent()==0 && NumSqlsNeedingAttention[3h]
{isPlanChanged = "1", resourceId = "opsi.ocid}
.max() > 0

Similar patterns can be used for any of the dimensions. In general, to trigger an alarm on a specific condition, the generic alarm definition syntax would look like the following:

NumSqlsNeedingAttention[3h]
{dim1="val1", dim2="val2", ....}
.absent()==0 && NumSqlsNeedingAttention[3h]
{dim1="val1", dim2="val2, ...}
.sum() > 5
Note

You must specify both and absent condition and a threshold condition as shown above and the dimension specification must be the same in both clauses. You should only change the dimensions or the threshold value as needed and leave the other values as is.

Data Flow Delays

You can create alerts to conditions defined for the DataFlowDelayInHrs metric. The following table shows some recommended alarms you can set up along with a corresponding Monitoring Query Language (MQL) example which you can use as a template to define your alarms. For more information about setting up alarms, see Managing Alarms.

Alarm Name MQL Alarm Definition Description
DataFlowSourceAlarmFor1HrData DataFlowDelayInHrs[1h]{dataProcessingFrequencyInHrs="1.00"}.grouping(telemetrySource , sourceIdentifier).mean() > 48

Pending duration: 1h

For a sourceType, sourceIdentifier with 1 hour data processing frequency, the mean value (across targets) of DataFlowDelayInHrs is greater than 48 hours for continuous 6 hours. This indicates that the problem is at the whole source level.
DataFlowResourceAlarmFor1HrData DataFlowDelayInHrs[1h]{dataProcessingFrequencyInHrs="1.00"}.grouping(telemetrySource, resourceId,resourceDisplayName, sourceIdentifier).max() > 24

Pending duration: 1h

For a sourceType, resource & sourceIdentifier, DataFlowDelayInHrs is more than 24 hours for continuous 1 day for the type of data for which data processing frequency is every 1 hour.
DataFlowResourceAlarmFor3HrData DataFlowDelayInHrs[3h]{dataProcessingFrequencyInHrs="3.00"}.grouping(telemetrySource, resourceId, sourceIdentifier).max() > 48

Pending duration: 1h

For a sourceType, resource & sourceIdentifier, DataFlowDelayInHrs is more than 48 hours for continuous 1 day for the type of data for which data processing frequency is every 3 hours.
DataFlowResourceAlarmForDailyData DataFlowDelayInHrs[3h]{dataProcessingFrequencyInHrs="24.00"}.grouping(telemetrySource, resourceId, sourceIdentifier).mean()

Pending duration: 1h

For a sourceType, resource & sourceIdentifier, DataFlowDelayInHrs is more than 72 hours for continuous 1 day for the type of data for which data processing frequency is every 24 hours.

About Forecast Issues

Operations insights provides metrics to help you configure alarms for high (default value >75%) or low (default value < 25%) utilization for a given resource and resource metric. Additionally you can customize these forecast metric thresholds. Helping provide more granular capacity management forecasting, allowing you to be more proactive in resource management by setting threshold values that are more relevant to a specific target type for more accurate forecasting. For more information on setting threshold values see: Changing Utilization Thresholds.

The forecast metrics are generated using at most 100 days of history data and forecast window of 90 days. You can verify the forecast from Operations Insights console by selecting 1 year in the Time Range Filter and High or Low utilization for 90 days, as shown below.


Time range selector

90 day high utilization

90 day low utilization

The following table shows a sample of a recommended alarm you can set up along with a corresponding Monitoring Query Language (MQL) example which you can use as a template to define your alarms. For more information about setting up alarms, see Managing Alarms.

Alarm Name MQL Description
DaysToReachHighUtilizationStorageLessThan30D DaysToReachHighUtilization[1D]{resourceMetric="STORAGE", resourceType="Database", exceededForecastWindow="false"}.grouping(telemetrySource,resourceId).mean() < 30," For sourceType, resourceType, resourceMetric and sourceIdentifier, DaysToReachHighUtilization is less than 30 days.
DaysToReachHighUtilizationExaStorage DaysToReachHighUtilization[1D]{resourceMetric="STORAGE", resourceType="Database", exceededForecastWindow="false"}.grouping(telemetrySource,resourceId).mean() < 30, For sourceType, resourceType, resourceMetric and sourceIdentifier, DaysToReachHighUtilization is less than 30 days.
Note

For linear and seasonality aware forecasts, the forecast window is 90 days, which means that if a specific resource has a forecast of more than 90 days, by default the metric value will show 91 days. For AutoML this is forecast by number of data points available.

Using the Console

Using the Console

To view metric charts by dimension

  1. Open the navigation menu and click Observability & Management. Under Monitoring, click Service Metrics.
  2. For Metric Namespace, select oci_operations_insights.
  3. For Dimensions, click Add.
  4. For Dimension Name, select a dimension and then select a Dimension Value.

    Add more dimensions as needed.

  5. Click Done.

    The Service Metrics page displays a default set of charts for the selected metric namespace and dimension. You can also use the Monitoring service to create custom queries.

For more information about monitoring metrics and using alarms, see Monitoring. For information about notifications for alarms, see Notifications Overview.

To view metric charts using Metrics Explorer

  1. Open the navigation menu and click Observability & Management. Under Monitoring, click Metrics Explorer.

    The Metrics Explorer page displays an empty chart with fields to build a query.

  2. Select a compartment.
  3. From Metric Namespace, select oci_operations_insights.
  4. From Metric Name, select a metric.
  5. (Optional) Refine your query.

    For instructions, see To create a query.

  6. Click Update Chart.

    The chart shows the results of your new query. You can optionally add more queries by clicking Add Query below the chart.

For more information about monitoring metrics and using alarms, see Monitoring. For information about notifications for alarms, see Notifications Overview.

Using the APIs

Using the API

For information about using the API and signing requests, see REST APIs and Security Credentials. For information about SDKs, see Software Development Kits and Command Line Interface.

Use the following APIs for monitoring: