Troubleshooting Oracle Cloud Agent

When using Oracle Cloud Agent, you might encounter the following problems:

  • On the Oracle Cloud Agent tab of the Instance Details page, the status for all plugins is Invalid.
  • In the Metrics section of the Console dashboard, you can't see any CPU, memory, network, or disk metrics for the instance.

If you encounter any of these problems, Oracle Cloud Agent might not be installed or running, or it might not be able to communicate with Oracle services. To diagnose the specific issue, follow these troubleshooting steps.

Tip

In this topic, the instructions for Oracle Linux also apply to CentOS images.

Step 1: Verify that Oracle Cloud Agent is Installed

Follow these steps to confirm that Oracle Cloud Agent is installed on your instance.

  1. Connect to the instance and run one of the following commands, depending on your operating system.
  2. If the message indicating that Oracle Cloud Agent is installed does not display after you run the command, install Oracle Cloud Agent. If Oracle Cloud Agent is installed, proceed to the next step to verify that it is running.

Step 2: Verify that Oracle Cloud Agent is Running

After you confirm that Oracle Cloud Agent is installed, follow these steps to confirm that it is running.

  1. Connect to the instance and run one of the following commands to restart Oracle Cloud Agent.
  2. If the message indicating that Oracle Cloud Agent is running does not display after you run the command, run the diagnostic tool and then file a support ticket with the file that contains debugging information and logs for the plugins. If Oracle Cloud Agent is running, proceed to the next step to verify that it can connect to Oracle services.

Step 3: Verify that Oracle Cloud Agent Can Connect to Oracle Services

If you confirm that Oracle Cloud Agent is installed and running but the status for all plugins on the Instance Details page is Invalid or you cannot see any metrics in the Metrics section of the Console dashboard, Oracle Cloud Agent might not be able to connect to Oracle services. The following sections explore possible reasons that Oracle Cloud Agent is unable to connect to Oracle services. To diagnose the issue, follow these steps in order.

  1. Verify that the instance can access the Instance Metadata Service endpoint.
  2. Check for clock skew errors.
  3. Verify that gateways are configured correctly.
  4. Change your proxy server settings.

Verify that the Instance Can Access the Instance Metadata Service Endpoint

These steps verify whether the instance can access the Instance Metadata Service endpoint.

  1. Connect to the instance and run one of the following commands, depending on you operating system.
  2. If you get a successful response without proxy errors, check for clock skew errors. If proxy server errors occur, check your proxy server settings.

Check for Clock Skew Errors

Sometimes, the clock on an instance is not synchronized with the NTP service. Clock skew can cause TLS negotiations to fail, preventing the instance from connecting to Oracle services. Follow these steps to check for clock skew errors.

  1. Connect to the instance and run one of the following commands to generate the monitoring.log file.

    If there is a clock skew error, a message similar to the following displays:

    failed to call: Service error:NotAuthenticated. Date 'Tue, 09 Mar 2021 06:39:35 UTC' is not within allowed clock skew.
    Current 'Tue, 09 Mar 2021 06:45:45 UTC', valid datetime range: ['Tue, 09 Mar 2021 06:40:45 UTC', 'Tue, 09 Mar 2021 06:50:46 UTC'].
    http status code: 401. Opc request id: <unique_id>
  2. If a clock skew error occurs, configure the Oracle Cloud Infrastructure NTP service for your instance. If no clock skew error occurs, verify that gateways are configured correctly.
  3. If you configured the NTP service in the previous step, after you complete the configuration, run one of the following commands to restart Oracle Cloud Agent:
  4. Generate the monitoring.log file again.

    If Oracle Cloud Agent is running correctly, a successful response is 200 OK. In the monitoring.log, look for a message similar to the following:

    2021/03/18 03:12:44.391381 t2.go:139: Sent metrics status: 200; took: 387ms; with opc-request-id:<unique_ID>;
    2021/03/18 03:13:44.006391 instancemetadata_client.go:64: fetched metadata from http://169.254.169.254/opc/v2/instance/ , status 200 OK
    2021/03/18 03:13:44.730102 t2.go:139: Sent metrics status: 200; took: 723ms; with opc-request-id:<unique_ID>;
    2021/03/18 03:14:44.324046 t2.go:139: Sent metrics status: 200; took: 320ms; with opc-request-id:<unique_ID>;

Verify Permissions for Windows Domain Joined Instances

If you have a Windows instance that is joined to a domain, verify that the virtual account is granted the Log on as a service user right in the local Group Policy. To set permissions, follow the steps for enabling service log on through a local group policy in Microsoft's Enable Service Logon guide. For Log on as a service, add the user NT SERVICE\ALL SERVICES or the specific user.

Verify that Gateways are Configured Correctly

For Oracle Cloud Agent to communicate with Oracle services, gateways in subnets must be configured correctly. Follow these steps to verify and correct your configuration.

  1. Configure the internet gateway, NAT gateway, or service gateway for the subnet in your VCN.
  2. After you follow the configuration steps, restart the services using the commands in the Verify that the Instance Can Access the Instance Metadata Service Endpoint section. After you restart the services, check the monitoring.log file for successful requests to Oracle services.

Change Proxy Server Settings

Sometimes, local proxy servers prevent Oracle Cloud Agent from communicating with any services. Each proxy server is different.

Often, setting the http_proxy, https_proxy, and no_proxy environment variables for the oracle-cloud-agent and oracle-cloud-agent-updater services on the proxy client instances resolves proxy issues. After you set these environment variables, in the proxy server access.log file (or equivalent, depending on your system), verify that you see requests from the proxy client to services that Oracle Cloud Agent accesses.

Step 4: Generate a Diagnostic File for Oracle Cloud Agent

To make it easier for Oracle support to help you troubleshoot issues with the Oracle Cloud Agent software, you can run the Oracle Cloud Agent diagnostic tool on your compute instances. The diagnostic tool generates a file that contains debugging information and logs for the plugins that Oracle Cloud Agent manages.

The diagnostic tool is installed with Oracle Cloud Agent version 1.14.0 and later. To update Oracle Cloud Agent, see Updating the Oracle Cloud Agent Software.

After you complete the previous troubleshooting steps, run the diagnostic tool and then file a support ticket with the file that contains debugging information and logs for the plugins.