Observability and Management in the Cloud
Use Oracle Cloud Infrastructure (OCI) Observability and Management services to gain visibility and actionable insights that help you manage your cloud environment.
OCI services related to observability and management let you monitor, audit, and alert to changes in your cloud environment. Insights driven by machine learning help you manage resources that are deployed on a variety of technology across all layers of the stack.
A top priority is to increase automation that enables scalable, predictable results. Use integrated functionality and automation for DevOps monitoring and IT operations management to prevent and solve IT problems.
The Observability and Management services in OCI include the following services:
- Application Performance Monitoring
- Application Performance Monitoring provides deep visibility into applications performance and enables rapid diagnostics for delivering a consistent level of service. This includes monitoring of multiple components and application logic spread across clients, third-party services, and back-end computing tiers, on premises or on the cloud. For an overview, see the Application Performance Monitoring product page.
- Management Agent
- Management Agent is a service that provides low latency interactive communication and data collection between OCI and other targets.
- Database Management
- Database Management provides comprehensive database performance diagnostics and management capabilities to monitor and manage Oracle databases. For an overview, see the Database Management product page.
- Logging
- Logging lets you enable, view, and manage all the logs in your tenancy, and provides access to logs from OCI resources. These logs include critical diagnostic information that describes how resources are performing and being accessed. For an overview, see the Logging product page.
- Log Analytics
- Log Analytics is a unified, integrated cloud solution that lets you monitor, aggregate, index, analyze, search, explore, and correlate all log data from your applications and system infrastructure. For an overview, see the Log Analytics product page.
- Java Management
- Java Management is a reporting and management infrastructure within OCI. It lets you observe and manage the use of Java in your enterprise.
- Monitoring
- Use Monitoring to query metrics and manage alarms. Metrics and alarms help monitor the health, capacity, and performance of your cloud resources.
- Ops Insights
- Ops Insights provides comprehensive information about the resource use and capacity of databases and hosts. Use this service to analyze CPU and storage resources, forecast and plan capacity, and proactively identify SQL performance issues across a database fleet. For an overview, see the Ops Insights product page.
- Service Connector Hub
- Service Connector Hub is a cloud message bus platform that offers a single pane of glass for describing, running, and monitoring interactions when moving data between OCI services. For an overview, see the Service Connector Hub product page.
- Stack Monitoring
- 
Stack Monitoring enables proactive monitoring of applications and their underlying stack, including application servers and databases. By discovering all components of an application, including the application topology, Stack Monitoring automatically collects status, load, response, error, and utilization metrics for all application components. Each component of the application stack is referred to as a resource. For an overview, see the Stack Monitoring product page. 
To gain comprehensive visibility into your newly deployed cloud environment, use the Observability and Management services that meet your organization's needs.
Monitoring
Use metrics and alarms to monitor the health, capacity, and performance of your cloud resources.
The following table provides some key areas to consider when defining your organization's monitoring strategy.
| Area | Data to Monitor | 
|---|---|
| Accounts | Account management Subscription extension to other regions Creation and deletion of administrative accounts Quota breaches | 
| Usage of cloud services | Number of instances Storage, including latest, maximum, and average use Object count, including procedures and views Number of compartments Resources overutilized or underutilized Monthly or yearly utilization of services | 
| Metrics | Business metrics | 
| Financial metrics | 
Operations
Define operational activities, or common tasks to be performed periodically.
Your operations strategy should include the following recommended activities:
- Define operational procedures
- Establish a maintenance schedule
- Use configuration management utilities
- Back up data in storage and databases
- Verify backup integrity and process
- Validate backup security and encryption
- Replicate your data for disaster recovery
- Automate OS management (OS Management Hub service)
- Automate patching and maintenance
- Stay up to date with security patches, bug fixes, and enhancement updates
- Manage service limits and be aware of fixed service limits
- Factor failover usage in your service limits
- Set compartment quotas
Auditing
Use the Audit service to gain visibility into activities related to your OCI resources and tenancy.
Audit log events can be used for security audits, to track usage of and changes to OCI resources, and to help ensure compliance with standards or regulations.
Your audit strategy should include the following recommended activities:
- Configure auditing
- Conduct audits
- 
Audit your policies. For example: - Where are your policies defined, and do they comply with your organization's standards for compartment usage?
- Audit the usage of dynamic groups. Do these groups grant excess privileges?
- What services are configured and where are they located? Should any services be limited to certain compartments or groups?
- Are there any duplicate statements that should be removed?
- Are there policies that grant privileges to the whole tenancy?
- Are there groups that have more privileges than they need?
- Check long running workflows
- Maintain system logs, application logs, and audit logs
- Continuously scan for vulnerabilities
 
Events and Notifications
Use the Events service to create automation in your tenancy. Use the Notifications service to get messages whenever alarms, service connectors, and event rules are triggered.
Events are structured messages that indicate changes in resources. Events trigger actions such as notifications. Because rules for events apply to events in the compartment in which you create them and any child compartments, we recommend that you create rules at the root compartment level.
The Notifications service is a multi-channel messaging service that broadcasts messages to users and applications when events of interest occur within OCI. Messages can be sent to various subscription protocols, including email, HTTPS, PagerDuty, Slack, and the OCI Functions service. Some channels require confirmation of the subscription before it becomes active.
We recommend that you create at least one notification topic and subscription to receive messages related to Monitoring metrics.
Notifications should also be triggered when there are changes to the following resources:
- Identity provider (IdP)
- IdP group mapping
- OCI Identity and Access Management (IAM) group
- IAM policy
- Users
- Virtual cloud networks (VCNs)
- Route tables
- Security lists
- Network security groups
- Network gateways