Troubleshooting Logging

Use troubleshooting information to identify and address common issues that can occur while working with Logging.

General Unified Monitoring Agent Troubleshooting

Hardware Requirements

Depending on your logging requirements and configuration (number of logs, type of buffering, and so on), the hardware requirements and performance of the Unified Monitoring Agent can vary widely. When no operational pressure is present (less than 1.000 log events per minute), the agent should not consume more than 200 MB of RAM, and 20% of a CPU core. The Unified Monitoring Agent service hard-coded limits are 5 GB RAM, and 40% of a core. 1 GB of RAM is also recommended.

Enabling Monitoring

Monitoring can aid with troubleshooting. See Enabling Monitoring for Compute Instances for more information on how you can enable monitoring (metrics and logging) in your Oracle Cloud Infrastructure Compute instances.

Linux Unified Monitoring Agent

systemd Units

The Unified Monitoring Agent is based on systemd units, and is composed of the following components:

  1. unified-monitoring-agent.service: The main Unified Monitoring Agent service.
  2. unified-monitoring-agent_config_downloader.service: The configuration automatic updater service.
  3. unified-monitoring-agent_config_downloader.timer: The timer unit, which triggers the automatic downloader service on specified, randomized, intervals.
  4. unified-monitoring-agent_restarter.path: The path unit, which triggers the reload of the configuration by the Unified Monitoring Agent, if a change is detected (because of a new configuration being downloaded by the automatic updater service).
Note

Most of the systemctl or journalctl commands must be run with super user privileges (either as root, or through sudo).

To verify the correct operation of these systemd units, you can use the systemctl command like the following:

systemctl status <unit_name>

Where <unit_name> must be replaced with one of the following values:

  1. unified-monitoring-agent.service
  2. unified-monitoring-agent_config_downloader.service
  3. unified-monitoring-agent_config_downloader.timer
  4. unified-monitoring-agent_restarter.path

Typically these systemctl commands show output similar to the following:

systemctl status unified-monitoring-agent.service
   unified-monitoring-agent.service - unified-monitoring-agent: Fluentd based data collector for Oracle Cloud Infrastructure
   Loaded: loaded (/usr/lib/systemd/system/unified-monitoring-agent.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2020-09-29 13:54:03 UTC; 1min 37s ago
     Docs: https://docs.cloud.oracle.com/
  Process: 2337 ExecReload=/bin/kill -USR2 ${MAINPID} (code=exited, status=0/SUCCESS)
  Process: 2321 ExecStart=/opt/unified-monitoring-agent/embedded/bin/fluentd --log /var/log/unified-monitoring-agent/unified-monitoring-agent.log --daemon /var/run/unified-monitoring-agent/unified-monitoring-agent.pid --log-rotate-size 1048576 --log-rotate-age 10 $EXTRA_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 2327 (fluentd)
   Memory: 66.3M (limit: 5.0G)
   CGroup: /system.slice/unified-monitoring-agent.service
           ├─2327 /opt/unified-monitoring-agent/embedded/bin/ruby /opt/unified-monitoring-agent/embedded/bin/fluentd --log /var/log/unified-monitoring-agent/unified-monitoring-agent.log --daemon /var/run/unif...
           └─2330 /opt/unified-monitoring-agent/embedded/bin/ruby -Eascii-8bit:ascii-8bit /opt/unified-monitoring-agent/embedded/bin/fluentd --log /var/log/unified-monitoring-agent/unified-monitoring-agent.lo...
systemctl status unified-monitoring-agent_config_downloader.service
  unified-monitoring-agent_config_downloader.service - unified-monitoring-agent Fluentd configuration downloader.
  Loaded: loaded (/usr/lib/systemd/system/unified-monitoring-agent_config_downloader.service; enabled; vendor preset: disabled)
  Active: inactive (dead) since Tue 2020-09-29 13:54:38 UTC; 1min 30s ago
 Process: 2333 ExecStart=/opt/unified-monitoring-agent/embedded/bin/ruby /opt/unified-monitoring-agent/embedded/bin/fluent_config_updater.rb -c /etc/unified-monitoring-agent/conf.d/ -b 10 (code=exited, status=0/SUCCESS)
Main PID: 2333 (code=exited, status=0/SUCCESS) 
systemctl status unified-monitoring-agent_config_downloader.timer
  unified-monitoring-agent_config_downloader.timer - Run unified-monitoring-agent configuration automatic updater.
   Loaded: loaded (/usr/lib/systemd/system/unified-monitoring-agent_config_downloader.timer; enabled; vendor preset: disabled)
   Active: active (waiting) since Tue 2020-09-29 13:54:03 UTC; 3min 57s ago 
systemctl status unified-monitoring-agent_restarter.path
  unified-monitoring-agent_restarter.path - "Monitor the /etc/unified-monitoring-agent/conf.d/ directory for changes"
   Loaded: loaded (/usr/lib/systemd/system/unified-monitoring-agent_restarter.path; enabled; vendor preset: disabled)
   Active: active (waiting) since Tue 2020-09-29 13:54:03 UTC; 4min 9s ago 

The most important parts of the systemctl command output are the Loaded and Active fields. The Loaded field has the value loaded for all system units. The Active field has the following values:

  • active (running) for the unified-monitoring-agent.service unit.
  • active (waiting) or active (running) for the unified-monitoring-agent_restarter.path and the unified-monitoring-agent_config_downloader.timer units.
  • active (running) or inactive (dead) for the unified-monitoring-agent_config_downloader.service unit. For the latter value, the field Main PID includes the value code=exited, status=0/SUCCESS).

Check Running Processes

Another way to further verify the correct operation of the Unified Monitoring Agent, is to check the system’s running processes. When operating correctly, the Unified Monitoring Agent runs two processes: one supervisor process, and one worker process. You can verify their existence by running the following command in a terminal (sample output included):

ps aux | grep unified-monitoring-agen[t]
root      2327  0.0  2.3 307704 40864 ?        Sl   13:54   0:00 /opt/unified-monitoring-agent/embedded/bin/ruby /opt/unified-monitoring-agent/embedded/bin/fluentd --log /var/log/unified-monitoring-agent/unified-monitoring-agent.log --daemon /var/run/unified-monitoring-agent/unified-monitoring-agent.pid --log-rotate-size 1048576 --log-rotate-age 10
root      2330  0.2  2.1 297456 38192 ?        S    13:54   0:03 /opt/unified-monitoring-agent/embedded/bin/ruby -Eascii-8bit:ascii-8bit /opt/unified-monitoring-agent/embedded/bin/fluentd --log /var/log/unified-monitoring-agent/unified-monitoring-agent.log --daemon /var/run/unified-monitoring-agent/unified-monitoring-agent.pid --log-rotate-size 1048576 --log-rotate-age 10 --under-supervisor

As shown in the preceding sample, there are two processes running, with the same arguments, except for the extra –under-supervisor added to the second one. This denotes the worker process, thus making the process without this parameter the supervisor.

Unified Monitoring Agent Log Location

Note

Most of the systemctl or journalctl commands must be run with super user privileges (either as root, or through sudo).

The Unified Monitoring Agent logs are available at /var/log/unified-monitoring-agent/unified-monitoring-agent.log. This file includes logs from the Unified Monitoring Agent itself.

Besides the agent's logs, which do not contain system-related events (for example, service start, service stop, and so on), you can also view the logs from journald, systemd's system logging service. To view the system logs specific to a unit, you can use the journalctl command like the following:

journalctl -u <unit_name>

Where <unit_name> must be replaced with one of the following values:

  1. unified-monitoring-agent.service
  2. unified-monitoring-agent_config_downloader.service
  3. unified-monitoring-agent_config_downloader.timer
  4. unified-monitoring-agent_restarter.path
When querying journald logs through journalctl, you can also define specific time ranges:
journalctl --since "2020-12-30 00:00:01" --until "2020-12-31 23:59:59"
The date format used is YYYY-MM-DD HH:MM:SS.
You can also tail the journal logs, by adding the -f parameter:
journalctl -f

The Unified Monitoring Agent is Not Installed

For newly created instances, it can take up to 25 minutes for the automatic installation of the agent. If it is not installed after this time period, check the following:

  1. The network connectivity of the instance.
  2. Whether monitoring is enabled in the Console.

You can also check the log file /var/log/oracle-cloud-agent/plugins/unifiedmonitoring/unifiedmonitoring.log for information regarding the installation of the Unified Monitoring Agent by the Oracle Cloud Agent.

The Unified Monitoring Agent is Not Running

If the status is not loaded or active, nor are both supervisor and worker processes running, restart the Unified Monitoring Agent and check the logs for any problems:

systemctl restart unified-monitoring-agent

Configuration Not Automatically Downloaded

Ensure you have followed the steps in Installing the Agent and Verify Agent Installation. Consult the journal of the automatic configuration updater service by running:

journalctl -u unified-monitoring-agent_config_downloader.service

Configuration Not Automatically Reloaded

Ensure you have followed the steps in Installing the Agent and Verify Agent Installation. Consult the journal of all the units:

  1. The timer unit must have run at least one time.
  2. The automatic configuration download service must have run after the relevant time unit has triggered it. You can verify from its logs that the configuration has been downloaded and extracted to the Unified Monitoring Agent's configuration directory. You can also verify this by listing the files in that directory: ls -lhatR /etc/unified-monitoring-agent.
  3. Verify that the path unit is active by checking its status: systemctl status unified-monitoring-agent_restarter.path.
  4. Verify that a reload signal has been received by the Unified Monitoring Agent, by inspecting its journal: journalctl -u unified-monitoring-agent_config_downloader.service. "Reloading unified-monitoring-agent" appears in the output of this command.

Test Parsing Pattern and Force Agent to Immediately Download the Configuration

Run the following command:

systemctl restart unified-monitoring-agent_config_downloader
Note

Automatic update of the configuration on the agent side can take up to 30 minutes.

Agent Configuration for Linux Instance Fails

Failure can occur if the configuration using the Windows endpoint appears in a Linux instance, since Fluentd on Linux does not support the Windows plugin, and will fail upon initialization. The cause of this is usually that the customer configured a Windows agent configuration, and assigned it to the same dynamic group that also has a Linux instance configuration. Such a setup is not supported. Linux and Windows instances require their own dynamic groups.

Create a Custom Log to View the Contents of an Alert Log of a Database System Using OCI

The Unified Monitoring Agent does not support the database system.

Data Collection

If you want to open a ticket so an engineer can help you with your problem regarding the Unified Monitoring Agent, include the output of the following commands. Super user privileges might be required for some of them.

yum info unified-monitoring-agent
rpm -ql unified-monitoring-agent |  xargs sha512sum
systemctl status --full unified-monitoring-agent.service
systemctl status --full unified-monitoring-agent_config_downloader.service
systemctl status --full unified-monitoring-agent_config_downloader.timer
systemctl status --full unified-monitoring-agent_restarter.path
journalctl -a --no-pager -u unified-monitoring-agent.service
journalctl -a --no-pager -u unified-monitoring-agent_config_downloader.service
journalctl -a --no-pager -u unified-monitoring-agent_config_downloader.timer
journalctl -a --no-pager -u unified-monitoring-agent_restarter.path

For Ubuntu use a command like the following:

apt show unified-monitoring-agent
dpkg -L unified-monitoring-agent | xargs sha512sum

Also include an archive of the files under /var/log/unified-monitoring-agent/ and /var/log/oracle-cloud-agent/. You can create a gzipped tar archive of these directories with the command:

tar cvzf agent_logs_$(date +%s).tar.gz /var/log/unified-monitoring-agent/ /var/log/oracle-cloud-agent/

If the Unified Monitoring Agent is running but has erratic behavior, you can also include backtrace and memory profile information, by running the following command and including the files /tmp/sigdump-<integer>.log in your report (where <integer> is an integer with 1–6 digits, even though in rare cases it might have more than that).

ps aux | grep unified-monitoring-agen[t] | grep ruby | awk '{print $2}' | xargs kill -SIGCONT

What this command does is to find the Unified Monitoring Agent process PIDs, and send them the SIGCONT signal, which causes a dump to be generated in /tmp/sigdump-<integer>.log.

Uninstall and Reinstall

You can remove the Unified Monitoring Agent, without removing the agent's configuration, by running the following command:

yum -y remove unified-monitoring-agent

For Ubuntu:

apt -y remove unified-monitoring-agent

The agent's configuration remains under the /etc/unified-monitoring-agent/ directory. If you do not want to keep the configuration for a future (re)installation of the Unified Monitoring Agent package, you need to remove it manually:

# use the following command to print the contents of the agent's configuration directory
find /etc/unified-monitoring-agent/
# use the following command to remove the directory and all of its contents (this step cannot be undone)
rm -rf /etc/unified-monitoring-agent/

The agent is automatically reinstalled by the Oracle Cloud Agent, at most 25 minutes. You need to have monitoring enabled for your instance in the Console for this to occur. See Oracle Cloud Agent for more information.

Windows Unified Monitoring Agent

To Check Service Status

  1. The agent runs as part of a Windows service, to see its status, open the start menu and type Services.msc and open it. Go to the service Oracle Cloud Unified Monitoring Service to see the status.
  2. Right-click the service and select Properties for more information. Start/stop/restart are available here.
  3. From the Start menu type cmd, right-click on Command Prompt and select Run as Administrator. Run the following commands:
  • To view Unified Monitoring Agent service status:
    sc query unified-monitoring-agent
  • Restart the Unified Monitoring Agent service:
    sc stop unified-monitoring-agent
    sc start unified-monitoring-agent
Note

The preceding commands do not work in PowerShell, so you must instead use the Windows Command Prompt.

To Find Windows Service Errors

  1. From the Start menu, type Event Viewer and select it.
  2. Open Windows Logs, then System. Every time a service starts or stops, fails to do either, or crashes suddenly, it is recorded here.
    Note

    On most Windows machines, there is a cap on how many events can be in the event viewer. As a result, if an event happened a long time ago, the logs might not be available.

To View Fluentd Logs

  1. Open explorer.exe (file icon on the task bar)
  2. Go to C:\oracle_unified_agent.
  3. If there is only one file, it means that there isn’t a valid configuration file on the machine.
  4. If there are two files, then there is a supervisor log that will have all the setup/start-up logs, and a worker log with all the parsing/output logs. unified-monitoring-agent.conf is the name of the configuration file if it has been downloaded properly.
  5. Run Fluentd manually. Try the preceding steps to identify the issue, but if needed, you can debug an issue by manually running Fluentd.
    Note

    Running Fluentd manually runs it in the Windows service, which stops the service from running as normally, which is different behavior than on Linux.
  6. Use the following command to run Fluentd manually. This can be run in PowerShell or Command Prompt, but it needs to be run as Administrator:
    C:\oracle_unified_agent\unified-monitoring-agent\embedded\bin\fluentd -c C:\oracle_unified_agent\unified-monitoring-agent.conf -vv

Automatic Configuration Updater Steps

  1. Verify Task Scheduler is running as expected.
  2. From the Start menu, and type Task Scheduler.
  3. Go to Task Scheduler (Local), then Task Scheduler Library. Find the task named UnifiedAgentConfigUpdater.
  4. Verify the Last Run Time. If it was at an invalid date, or it says not run, then the Next Run time will be when it should run for the first time. For debugging, select the task and select Run if you need it to run immediately.
  5. Last Run Result specifies the outcome of downloading the configuration from the control plane. If there is an error result, you need to run it manually to determine what happened. Task Scheduler does not keep output logs.
  6. Run the configuration updater manually.
    Note

    Run the updater in PowerShell as an Administrator for the best experience.
    C:\oracle_unified_agent\unified-monitoring-agent\embedded\bin\ruby.exe C:\oracle_unified_agent\unified-monitoring-agent\embedded\lib\ruby\gems\2.6.0\gems\fluent-public-config-updater*\lib\fluent_config_updater.rb -c C:\oracle_unified_agent -b 10

Check Oracle Cloud Agent Logs

For Windows Server 2012r2 or 2016, the log file locations are:

  • C:\Users\OCA\AppData\Local\Local\OracleCloudAgent\agent.log
  • C:\Users\OCAUM\AppData\Local\OracleCloudAgent\plugins\unifiedmonitoring\unifiedmonitoring.log (runtime logs)
  • C:\Users\OCAUM\AppData\Local\OracleCloudAgent\plugins\unifiedmonitoring\unifiedmonitoring_msi.log (install logs)
  • C:\oracle_unified_agent\unified-monitoring-agent-0.log (agent worker log, which might not exist depending on state)
  • C:\oracle_unified_agent\unified-monitoring-agent-supervisor-0.log (agent supervisor log, which might not exist depending on state)

Windows Server 2019 log file locations:

  • C:\Windows\ServiceProfiles\OCA\AppData\Local\OracleCloudAgent\agent.log
  • C:\Windows\ServiceProfiles\OCAUM\AppData\Local\OracleCloudAgent\plugins\unifiedmonitoring\unifiedmonitoring.log (runtime logs)
  • C:\Windows\ServiceProfiles\OCAUM\AppData\Local\OracleCloudAgent\plugins\unifiedmonitoring\unifiedmonitoring_msi.log (install logs)
  • C:\oracle_unified_agent\unified-monitoring-agent-0.log (agent worker log, which might not exist depending on state)
  • C:\oracle_unified_agent\unified-monitoring-agent-supervisor-0.log (agent supervisor log, which may not exist depending on state)

Intermittent Failed MSI Install

An intermittent failed MSI install can occur for one of two reasons:

  1. An MSI installation was interrupted (system reboot, process stop, and so on), and on the second run, the msiexec.exe process is still holding a file handle to a folder that it created.
  2. During an upgrade where the MSI fails to get access to the main agent folder, because Ruby.exe doesn’t end like it should (a Fluentd issue). This causes the MSI to fail and to clean up the system, removing much of the agent (not the position or buffer files though).

In both instances, a second install or letting Oracle Cloud Agent run through the install a second time resolves this issue. If it still is stuck in this state do the following:

  1. Stop all msiexec and ruby processes in Task Manager, Details.
  2. Rename C:\oracle_unified_agent to C:\oracle_unified_agent_old.
  3. Install the agent again, or wait for Oracle Cloud Agent to install it.

Generic Paths not Working with Agent Configuration

Use a forward slash (/) while configuring paths for the Windows agent configuration. A backslash (\) with an asterisk (*) does not work on Windows due to internal limitations. To avoid this, do not use a path like C:\\path\\to\\*\\foo.log. Use the following forward slash method instead:

path C:/path/to/*/foo.log

The following paths are the supported generic working path examples for Windows:

  • C:/logs/*
  • C:/logs/t2*.txt
  • C:/logs/a*b.txt
  • C:/logs/abc*
  • C:/logs/*.txt
  • C:/logs/*/abc*
  • C:/logs/*/a.txt
  • C:/logs/*/a*b.txt
  • C:/logs*/*.txt

The C:/logs/*log.txt generic path, however, does not work for Windows.