Data Flow Metrics
Learn about the Spark-related metrics available from the oci_data_flow
metric
namespace.
Metrics Overview
The Data Flow metrics help you monitor the number of tasks that completed or failed and the amount of data involved. They are free service metrics and are available from Service Metrics, or Metrics Explorer. See Viewing the Metrics for more information.
Terminology
These terms help you understand what is available with Data Flow metrics.
- Namespace:
- A namespace is a container for Data Flow metrics. The namespace
identifies the service sending the metrics. The namespace for Data Flow is
oci_dataflow
.
- Metrics:
- Metrics are the fundamental concept in telemetry and monitoring. Metrics
define a time-series set of data points. Each metric is uniquely defined
by:
- namespace
- metric name
- compartment identifier
- a set of one or more dimensions
- a unit of measure
- Dimensions:
- A dimension is a key-value pair that defines the characteristics associated
with the metric. Data Flow has five
dimensions:
resourceId
: The OCID of a Data Flow Run instance.resourceName
: The name you've given the Run resource. It is not guaranteed to be unique.applicationId
: The OCID of a Data Flow Application instance.applicationName
: The name you've given the Application resource. It is not guaranteed to be unique or final.executorId
: A Spark cluster consists of a driver and one or more executors. The driver hasexecutorId = driver
, the executor hasexecutorId = 1.2.3...n
.
- Statistics:
- Statistics are metric data aggregations over specified periods of time. Aggregations are done using the namespace, metric name, dimensions, and the data point unit of measure within a specified time period.
- Alarms:
- Alarms are used to automate operations monitoring and performance. An alarm keeps track of changes that occur over a specific period of time and performs one or more defined actions, based on the rules defined for the metric.
Prerequisites
To monitor resources in Data Flow, you must be given the required type of access in a policy written by an administrator.
The policy must give you access to the monitoring services and the resources being monitored. This applies whether you're using the Console or the REST API with an SDK, CLI, or another tool. If you try to perform an action, and get a message that you don’t have permission or are unauthorized, confirm with your administrator the type of access you've been granted and which compartment you should work in. For more information on user authorizations for monitoring, see the Authentication and Authorization section for the related service: Monitoring or Notifications.
Available Metrics
Here are the metrics available for Data Flow. The control plane metrics are listed first, then the data plane metrics.
Metric Name | Display Name | Dimensions | Statistic | Description |
---|---|---|---|---|
RunTotalStartUpTime |
Run Startup Time |
|
Mean | The overall startup time for a run contains timings for resource assignment and Spark job startup as well as the time it waits in various queues internal to the service. |
RunExecutionTime |
Run Execution Time |
|
Mean | The amount of time it takes to complete a run, from the time it is executed until the time it completes. |
RunTotalTime |
Total Run Time |
|
Mean | The sum of the Run startup time and Run Execution Time. |
RunSucceeded |
Run Succeeded |
|
Count | Whether or not the run executed successfully. |
RunFailed |
Run Failed |
|
Count | Whether or not the run failed to execute. |
Metric Name | Display Name | Dimensions | Statistic | Description |
---|---|---|---|---|
CpuUtilization |
CPU Utilization |
|
Percent | The CPU utilization by the container allocated to the driver or executor as a percentage. |
DiskReadBytes |
Disk Read Bytes |
|
Sum | The number of bytes read from all block devices by the container allocated to the driver or executor in a given time interval. |
DiskWriteBytes |
Disk Write Bytes |
|
Sum | The number of bytes written from all block devices by the container allocated to the driver or executor in a given time interval. |
FileSystemUtilization |
File System Utilization |
|
Percent | The file system utilization by the container allocated to the driver or executor as a percentage. |
GcCpuUtilization |
GC CPU Utilization |
|
Percent | The memory utilization by the Java Garbage Collector of the driver or executor as a percentage. |
MemoryUtilization |
Memory Utilization |
|
Percent | The memory utilization by the container allocated to the driver or executor as a percentage. |
NetworkReceiveBytes |
Network Receive Bytes |
|
Sum | The number of bytes received from the network interface by the container allocated to the driver or executor in a given time interval. |
NetworkTransmitBytes |
Network Transmit Bytes |
|
Sum | The number of bytes transmitted from the network interface by the container allocated to the driver or executor in a given time interval. |
Viewing the Metrics
You can view Data Flow metrics in various ways.
- From the console, click the navigation menu, click Observability & Management, and under Monitoring, select Service Metrics. See Overview of Monitoring for how to use these metrics.
- From the console, click the navigation menu, click Observability & Management, and under Monitoring, select Metrics Explorer. See Overview of Monitoring for how to use these metrics.
- From the console, click the navigation menu, click Data Flow, and select Runs. Under Resources, click Metrics, and you see the metrics specific to this Run. Set the Start time and End time as appropriate, or a time period from Quick Selects. For each chart, you can specify an Interval and the Options as to how to display each metric.
- From the console, click the navigation menu, click Data Flow, and select Applications. You see the metrics specific to the Runs of this Application. Set the Start time and End time as appropriate, or a time period from Quick Selects. For each chart, you can specify an Interval and a Statistic, and the Options as to how to display each metric.