To monitor resources, you must be given the required type of access in a policy written by an administrator, whether you're using the Cloud Console or the REST API with an SDK, CLI, or other tool. The policy must give you access to the monitoring services and the resources being monitored. If you perform an action and get a message that you don't have permission or are unauthorized, confirm with your administrator the type of access you've been granted and which compartment you should work in. For information on user authorizations for monitoring and notifications, see the Authentication and Authorization section for the following services: Monitoring and Notifications.
Available Metrics: oci_big_data_service 🔗
Two types metrics are available for Big Data Service.
Cluster metrics
Cluster metrics enable you to obtain a cluster level report and monitor the different distributed key performance indicators.
Node metrics
Node metrics enable you to obtain node level reports and monitor status of individual nodes of the cluster.
Big Data Service emits metrics when the VMS isn't healthy. For example, one metric is emitted when the VM is down, and no metrics when the VMS is up or the VM is in STOPPED state.
Note
Big Data Service doesn't expose DenseIO related maintenance events through metrics if the compute action is either DISABLE or TERMINATE.
Big Data Service metrics include the following dimensions:
resourceId
The Oracle Cloud ID (OCID) of the Big Data Service cluster (for cluster metrics).
The Oracle Cloud ID (OCID) of the Big Data Service node (for node metrics)
resourceType
BigDataCluster (for cluster metrics)
BigDataClusterNode (for node metrics)
resourceDisplayName
This field serves as a unique identifier for each metric entity. The field is the node name that can be found from the Cluster details page.
MaintenanceStatus specific dimensions
maintenanceDueTime
The scheduled start time of the 24-hour maintenance window.
computeMaintenanceAction
The action that Oracle Cloud Infrastructure performs on an instance during a scheduled maintenance.
REBOOT: The instance is migrated from the physical host that needs maintenance to a healthy host. If live migration isn't possible, then the instance is reboot migrated.
REBUILD_IN_PLACE: The instance is stopped, rebuilt on the same physical hardware, and restarted. A downtime of several hours occurs during the maintenance process.
recommendedAction
The action that you can take before the scheduled maintenance event, so that you can control how and when your applications experience downtime.
The metrics listed in the following table are automatically available for any cluster that you create. You don't need to enable monitoring on the resource to get these metrics.
Metric
Metric Display Name
Unit
Description
Resource Type
HdfsSpaceUsed
HDFS Space Used
Bytes
Total HDFS space used on the cluster
Cluster
HdfsSpaceFree
HDFS Space Free
Bytes
Total free HDFS space on the cluster
Cluster
YarnJobsCompleted
Yarn Jobs Completed
Jobs/Min
Number of YARN jobs completed on this cluster
Cluster
SparkJobsCompleted
Spark Jobs Completed
Jobs/Min
Number of Spark jobs completed on this cluster
Cluster
ServiceCertificateExpiryTime
Service Certificate Expiry Time
Days
Number of days left for a particular service certificate to expire in the cluster
Cluster
CpuUtilization
CPU Utilization
Percentage
CPU Percentage used
Node
DiskUtilization
Disk Utilization
Bytes
Disk space used
Node
MemoryUtilization
Memory Utilization
Bytes
Total memory used
Node
NetworkBytesIn
Network Bytes In
Bytes/Min
Network bytes in per minute
Node
NetworkBytesOut
Network Bytes Out
Bytes/Min
Network bytes out per minute
Node
CertificateExpiryTime
Certificate Expiration Time
Days
Days until certificate expiration
Node
MaintenanceStatus
Maintenance Status
Count
A value of 0 indicates that the node has no scheduled maintenance reboot. A value of 1 indicates that the node has scheduled maintenance reboot.