This section covers some typical issues and resolutions related to the
Management Agents service, such as installing, and deinstalling with Management Agents and
Management Gateways.
Troubleshoot Management Agents Installation
and Configuration Issues 🔗
Users may encounter various errors during Oracle Management Agent
installation and configuration process. Causes and recommended actions for some common
errors are listed below.
Troubleshoot: Please
uninstall the agent and remove the service file before installing the new
agent!
Cause: There's an agent already installed on your host. A previous
deinstall process did not remove the agent service file successfully.
Action:
Run rpm -e oracle.mgmt_agent to uninstall the
agent. If command succeeds, try installing the new agent. If command doesn't
work, try the next recommended action.
Execute ls /opt/oracle/mgmt_agent to check if
you have residuals of the previous agent installation. If you find it,
delete it by running: rm -rf /opt/oracle/mgmt_agent.
Check if you already have agent service file at the following
location depending on your Linux version:
For OL7 (if you are using systemd):
/etc/systemd/system/mgmt_agent.service
For OL6 (if you are using init):
/etc/init/mgmt_agent.conf.
If
you find that you have this service file, remove it by running:
rm -rf /etc/init/mgmt_agent.conf and then
retry installing the new agent.
Troubleshoot: Java is not a
64-bit JVM! Please set path of a 64-bit JVM in the environment variable
JAVA_HOME or Java not found please set your preferred path in
JAVA_HOME.
Cause: The JAVA_HOME environment variable is not
set or it's not pointing to a 64 bit JDK location.
Action: Set JAVA_HOME environment variable to
the right JDK version and retry installing the agent. Currently, only 64 bit JDK is
supported.
Troubleshoot: Agent
Installation failed with message: useradd: Can't get unique GID (no more
available GIDs)
Cause: The installation script cannot add a user and group during
the management agent installation process because the available group ids on your
Linux system are already in use.
Executing install
Unpacking software zip
Copying files to destination dir (/opt/oracle/mgmt_agent)
useradd: Can't get unique GID (no more available GIDs)
useradd: can't create group
Agent installation failed, please check log file
Action: Consult with the system administrator before proceeding
with the following:
Edit the /etc/login.defs file. You require
sudo privileges to edit the file.
During the Management Agent installation, the mgmt_agent
user is created with the default home directory location under
/usr/share/mgmt_agent.
Cause: There's not enough file permissions under
/usr/share or the file system is read-only.
Possible Actions:
Set file permissions to give mgmt_agent user
access to the default user home directory location:
/usr/share.
Set a different home directory location using the
USER_HOME_DIR_ROOT environment variable if you want to
use a different location.
Set the USER_HOME_DIR_ROOT environment variable
with the path that you prefer to use as a home directory for
mgmt_agent user, and ensure the management agent user
has the right file permissions on that preferred directory.
Troubleshoot: Windows: The
system cannot find the path specified. Agent install failed.
ERRORLEVEL=9009
Possible Cause: Environment variables have not been set properly
due to spaces in the directory/folder name.
Windows environments allow to use spaces within a directory/folder name
which causes an issue with the Management Agent installation since quotes are added
to the name automatically by Windows. For example, there's a directory/folder named:
Program Files. In this case Windows auto-inserts quotes since
there's a space within the folder name, and it will now say: "Program
Files".
Extra quotes can cause an issue since Management Agent installer does not
allow quotes for environment variables like JAVA_HOME and
AGENT_INSTALL_BASEDIR.
Note
The Management Agent installer does not accept the following special
characters in the path: [, ^^,
", ', &, or
].
Action:
The recommended way to set up environment variables in Windows is by
using the Advanced System Settings.
On the Windows taskbar, right-click the
Windows icon and select
System.
In the Settings window, under
Related Settings, click Advanced
system settings.
On the Advanced tab, click
Environment Variables.
Click New to create a new environment
variable. Click Edit to modify an existing
environment variable.
After creating or modifying the environment variable, click
Apply and then OK to have
the change take effect.
Note
The graphical user interface for creating environment variables may
vary slightly, depending on your version of Windows.
Troubleshoot: Management
Agent status is "Not Available" in Console after the initial
installation
Possible Cause 1: Incorrect system timestamp
Action: Verify the system time of the agent's host, and then you
can correct the time if needed.
Possible Cause 2: If you use the input.rsp
response file for the Management Agent, you must define the tags for your Management
Agent compartment.
If the tags are not defined, you may see an error like this:
Action: To define the tags specific for your environment, in the
input.rsp response file, add the following parameters and
specify the key-value pairs for your environment. For more information, see Create a Response file .
Troubleshoot: After configuration, the Management
Agent is not visible in console or through the API
Possible Cause: If after you configure the management agent or
the management gateway agent the agent does not display in the Oracle Cloud console
or through the API, the correct policies may not be set up for the user or the user
group.
Action: Verify the user or the user group has the required
policies configured for the management agent or gateway agent. To setup polices, see
Create policies for user group.
Troubleshoot: Prometheus or Kubernetes
metrics monitored using Management Agent are not available
Possible Causes: Management Agent does not require dynamic group or policies for it's own metrics but does for Prometheus and Kubernetes metrics. The user must define a Dynamic Group and Policy that allows the agents in that dynamic group to post metrics to OCI Monitoring. If the metrics do not show up in the compartment or the OCI Monitoring namespace then you can check the policies and the dynamic group.
For example, ensure that the dynamic group definition is defined correctly as per the following syntax with right single quote characters around the compartment id and managementagent resource type:
ALL {resource.type='managementagent', resource.compartment.id='ocid1.compartment.oc1.examplecompartmentid'}
(c) Incorrect compartment id in Dynamic Group definition
Action: Verify that the install key compartment id is the same as the compartment id specified in the agent's dynamic group definition. By default the agent is created in the install key's compartment.
Troubleshoot: Agent runs
into OutOfMemoryException
Possible Cause: The agent might run out of heap memory if it is
not tuned properly to support the load that has been assigned to it.
Action: Update the heap memory settings for the Management Agent.
The out-of-box configuration for the maximum heap for the agent is:
128 MB for Management Agent as an OCA Plugin.
512 MB for standalone Management Agent. (The one downloaded from
Management Agent console).
The user can update and assign more heap to the agent by doing the
following:
Open file:
agent_inst/config/java.options.
Edit the above file. Update the heap setting by modifying the
following line: -Xmx512m
For example: The
above line sets the maximum heap for the agent to be 512 MB.
To change the heap to 800 MB update the above line to be:
-Xmx800m
Save the file and restart the agent for the changes to take
effect.
Troubleshoot: OCI Management
Agent is not starting on a Windows host
Possible Cause: If the Agent starts and fails with the following
error, this could be because the automatic upgrade of the Management Agent failed. You
may see the following errors.
C:\Oracle\mgmt_agent\agent_inst\log>NET START mgmt_agent
The Oracle Management Agent service is starting...................
The Oracle Management Agent service could not be started.
A service specific error occurred: 1.
More help is available by typing NET HELPMSG 3547.
In this log file,
C:\Oracle\mgmt_agent\agent_inst\log\mgmt_agent.log you may see the
following error.
[SysExecutor.0 (PrometheusEmitter.Agent-discovery)-131] INFO - DiscoveryItemTask PrometheusEmitter.Agent-discovery - autoPromote
[SysExecutor.1 (ManagedAgent upgrade checker)-133] INFO - Cleaning up old files...
[SysExecutor.1 (ManagedAgent upgrade checker)-133] INFO - On windows, skipping file owner check
[SysExecutor.1 (ManagedAgent upgrade checker)-133] INFO - Starting agent upgrade from version [231002.2039] to version [231002.2040]...
[SysExecutor.1 (ManagedAgent upgrade checker)-133] INFO - Inserted RequestSigner associated with request SigningRequester[get([])] for signingKey:SigningKey[xxxxxxxxxxxx]
[SysExecutor.1 (ManagedAgent upgradechecker)-133] INFO - Package Stream size:99003892
[SysExecutor.1 (ManagedAgent upgrade checker)-133] INFO - Successfully unzipped agent upgrade package at:
C:\Oracle\mgmt_agent\zip\unpack
[SysExecutor.1 (ManagedAgent upgrade checker)-133] INFO - Successfully copied C:\Oracle\mgmt_agent\agent_inst\bin\agentUpgrader.bat to
C:\Oracle\mgmt_agent\agent_inst\bin\tmpAgentUpgrader.bat
[SysExecutor.1 (ManagedAgent upgrade checker)-133] INFO - Successfully deleted previous wrapper backup file:
C:\Oracle\mgmt_agent\agent_inst\config\wrapper.conf.backedUpForUpgrade
[SysExecutor.1 (ManagedAgent upgrade checker)-133] INFO - Backed up wrapper.conf to attempt agent upgrade
[SysExecutor.1 (ManagedAgentupgrade checker)-133] INFO - Built macros for processing wrapper.conf as:{%SERVICE_TYPE%=mgmt_agent,%JAVA_HOME%=c:\Program
Files\Java\jre-1.8,%EMSTATE%=C:\Oracle\mgmt_agent\agent_inst, %CORE_JAR%=agent-upgrader-1.0.3235.jar,%VERSION%=231002.2039,
%ORACLE_HOME%=C:\Oracle\mgmt_agent\231002.2039}
[SysExecutor.1 (ManagedAgent upgrade checker)-133] INFO - Processed wrapper.conf.template to point it to agent upgrader
[SysExecutor.1(ManagedAgent upgrade checker)-133] INFO - Exiting for wrapper to spin up the agent upgrader...
Action: To fix the issue:
Stop the Management Agent on the Windows host and then enter the
following
commands:
NET STOP mgmt_agent
cd C:\Oracle\mgmt_agent\agent_inst\config
Backup wrapper.conf
Rename wrapper.conf.backedUpForUpgrade to wrapper.conf
Now, after the upgrade the Agent displays as Active under Observability &
Management.
Troubleshoot: Management
Agent automatic upgrade is not working or skipped some
Agents
Possible Cause: If the he OCI Management Agent automatic upgrade
is not working for some of the Management Agents, it's possible the Management Agent
automatic upgrade stopped working because some of the files or directories were
owned by invalid owners under the Agent file system.
For example, if some of files or directories in the following location
did not have the correct permissions, the agent automatic upgrade did not work:
/opt/oracle/mgmt_agent/agent_inst.
In this log file, you may find the following error:
/opt/oracle/mgmt_agent/agent_inst/log/mgmt_agent.log
ERROR - Following files are owned by invalid owners: [/opt/oracle/mgmt_agent/db00_cred.json,
/opt/oracle/mgmt_agent/agent_inst/config/emd.properties.backup]
(ManagedAgent upgrade checker)-32] WARN - Files with invalid owners were found, skipping auto-upgrade
Action: On the Management Agent host, confirm under the Agent file system, all
the files and directories are owned by the mgmt_agent owner and the
mgmt_agent:mgmt_agent group so the Management Agent
auto-upgrade can complete.
Troubleshoot: IP address
being displayed in host column when Management Agent installed on Windows
host
Problem: Management Agent is installed on a Windows host and the
Management Agent console displays the Windows host IP address in the Oracle Cloud
Console instead of displaying fully qualified domain name or Windows host name.
Action:
Log in to your Windows host and open the
Control Panel.
Select System and Security and then
select System.
Go to the Computer name, domain, and workgroup
settings section and then click Change
settings.
The System
Properties window displays.
If it's not selected, click Computer
Name.
Go to the following message: To rename this computer or
its domain or workgroup click Change.
Select Change, a Computer
Name/Domain Changes window displays.
For example, if the
FQDN of the Windows host is:
FOOBAR004.subnet1ab2regsu.dummytenantreg1.abcvcn.com,
enter the short Windows host nameFOOBAR004 in the
Computer Name text box.
Select More, the DNS suffix
and NetBIOS Computer Name window displays.
In the Primary DNS suffix of this
computer text box, enter the DNS name of the Windows
host.
For example:
subnet1ab2regsu.exampletenantreg1.abcvcn.com
Select OK or Apply
and then close all the open windows.
Restart the Windows host.
Uninstall the existing Management Agent by executing
uninstaller.bat script from the Windows terminal.
Now install again install Management Agent on the Windows
machine.
Management Agent installation should be successful and on the Agent page
FQDN of the Windows host would be displays in the host column.
Troubleshoot: Management Agent
installation fails on SELinux when using external volume
The agent service fails to start after executing the installation, resulting in a
non-working agent that displays the following
messages:
systemctl start mgmt_agent
Job for mgmt_agent.service failed because the control process exited with error code.
See "systemctl status mgmt_agent.service"and "journalctl -xeu mgmt_agent.service" for details.
To confirm, check the service manager logs for error details.
journalctl -xeu mgmt_agent.service
...
Dec 08 15:48:19 ol9-arm systemd[1261408]: mgmt_agent.service: Failed to execute /dir1/oracle/managementagent/agent_inst/bin/agentcore: Permission denied
Dec 08 15:48:19ol9-arm systemd[1261408]: mgmt_agent.service: Failed at step EXEC spawning /dir1/oracle/managementagent/agent_inst/bin/agentcore: Permission denied
All the above error messages indicate that your SELinux does not allow you to execute commands in the chosen folder.
Action: Contact your system administrator and create the required policies that allow installing and running the Management Agent.
Troubleshoot: Management
Agent installation fails on Red Hat Enterprise Linux 9.x
The Management Agent installation fails and the following error message
may display: mgmt_agent service creation failed. Reason: Detected
Linux .
Additionally, the install failure log messages may confirm the error and
indicate the set up attempts use an incorrect service manager to install the agent.
Cause: Red Hat removed the chkconfig package in
the Red Hat Enterprise Linux (RHEL) 9 distribution, for more details see the Red Hat Knowledge base.
Action:
Verify the Issue
Confirm the environment uses Red Hat Enterprise Linux 9.x by
running the following
command:
$ cat /etc/redhat-release
Red Hat Enterprise Linux release 9.3 (Plow)
The messages below highlight the problem that the OS/family was
not identified correctly using the rules present in agentcore script and the
install will attempt to set up agent service using init.d
and not systemctl on RHEL 9x.
$ rpm -ivh oracle.mgmt_agent.231118.1208.Linux-x86_64.rpm
Verifying... ################################# [100%]
Preparing... ################################# [100%]
Checking pre-requisites
Checking if any previous agent service exists
Checking if OS has systemd or initd
Checking available disk space for agent install
Checking if /opt/oracle/mgmt_agent directory exists
Checking if 'mgmt_agent' user exists
'mgmt_agent' user already exists, the agent will proceed installation without creating a new one.
Checking Java version
Trying /omc/java/jdk1.8.0_391
Java version: 1.8.0_391 found at /omc/java/jdk1.8.0_391/bin/java
Checking agent version
Updating / installing...
1:oracle.mgmt_agent-231118.1208.1################################# [100%]
Executing install
Unpacking software zip
Copying files to destination dir (/opt/oracle/mgmt_agent)
Initializing software from template
Checking if JavaScript engine is available to use
Creating 'mgmt_agent' daemon
mgmt_agent service creation failed. Reason: Detected Linux:
Installing the mgmt_agent daemon...
ln: failed to create symbolic link '/etc/init.d/mgmt_agent': No such file or directory
ln: failed to create symbolic link '/etc/rc3.d/K20mgmt_agent': No such file or directory
ln: failed to create symbolic link '/etc/rc3.d/S20mgmt_agent': No such file or directory
ln: failed to create symbolic link '/etc/rc5.d/S20mgmt_agent': No such file or directory
ln: failed to create symbolic link '/etc/rc5.d/K20mgmt_agent': No such file or directory
Service not installed.
warning: %post(oracle.mgmt_agent-231118.1208-1.x86_64) scriptlet failed, exit status 1
Verify the chkconfig package is missing
as described in the following article on the Red Hat Knowledge base.
Solution 1 - Install the chkconfig package
Install the missing package by executing the following
command:
$ dnf install chkconfig
Validate the package exists in the environment by executing the following
command:
$ rpm -qa | grep chkconfig
Install the Management Agent again.
Solution 2 - Without Installing the chkconfig package
Note
This is a workaround, only use this solution if the chkconfig
package can not be installed. The recommended solution is to install the
chkconfig package.
If installing the chkconfig package is not an option as
described in the above Solution 1 section, then complete the following steps as an
alternative solution to install the Management Agent software.
Use the following commands to:
Switch to a root shell.
Set the environment variable DIST_LINUX_FAMILY_OVERRIDE="Red Hat".
Troubleshoot: Unable To View
Prometheus Namespace and Metrics in the OCI Monitoring Service
In OCI Console, if the required policies are setup correctly, and the
Prometheus Namespace and metrics are not visible from OCI Monitoring in the Metric
Explorer then you may need to confirm the mgmt_agent OS user has read
permissions for .properties file.
Possible Cause: The mgmt_agent OS user does not have
permissions to read the .properties file. This file may be owned
by root OS user with 600 permissions.
Confirm the .properties file in
agent_inst/discovery/PrometheusEmitter is owned by the
mgmt_agent OS user and the mgmt_agent OS user has read
permissions on this file.
Restart the OCI Management Agent.
Troubleshoot: Flag provided but not
defined
Error: If you see the
following error: flag provided but not defined:
-trusted-certs-dir
Action: To resolve the issue, you can
upgrade OCA to the latest version using the following commands.
$ sudo -u oracle-cloud-agent /usr/libexec/oracle-cloud-agent/plugins/oci-managementagent/oci-managementagent -cli -trusted-certs-dir=/tmp/trustedcerts
flag provided but not defined: -trusted-certs-dir
Usage of /usr/libexec/oracle-cloud-agent/plugins/oci-managementagent/oci-managementagent:
-agent-config string
agent config yml file
-cli
run the monitoring in cli mode
-debug
enable debug logging
-dev
enable dev runs
-force-redeploy
force redeploy image
-metadata-config string
metadata config json file
-oci-config string
oci config file
-staging
enable staging endpoint
-upgrade-native-agent
invoke native agent upgrade
Troubleshoot:
Auto upgrade is enabled, but the agent does not upgrade automatically because
invalid file owner
Cause: You can configure Management Agents
to upgrade automatically. The automatic upgrade option is available
at the tenancy level, so if you select the automatic upgrade option in the Oracle Cloud
Console, all the agents in your OCI tenancy will upgrade automatically. It may take up
to 24 hours after a new version of the Agent is available in the Management Agent Cloud
Service, for the Agent to automatically upgrade.
If the Agent version
does not get updated after waiting for 24 hours, then some issues on the disk could
be preventing the Agent from upgrading automatically.
The most
common cause of this error is that files are owned by an OS user that is different
from user that installed Management Agent. The upgrade process runs as the same OS
user as the current running process, and does not have the ability to switch to the
root. Any file in the mgmt_agent directory manually created by the
user has the potential to interfere with the Agent's ability to upgrade
automatically.
You can find the mgmt_agent.log
file, at the following locations:
In the mgmt_agent.log file, you may see
the following error indicating the problematic files:
2024-08-14 18:13:31,857 [SysExecutor.7 (ManagedAgent upgrade checker)-36] ERROR - Following files are owned by invalid owners: [/opt/oracle/mgmt_agent/agent_inst/config/emd.properties.oldbackup]
2024-08-14 18:13:31,857 [SysExecutor.7 (ManagedAgent upgrade checker)-36] WARN - Files with invalid owners were found, skippingauto-upgrade
Action: You can use the following workaround for this
issue:
The user must change the ownership and group of the affected
files to the user account that originally installed the Management Agent.
If a file was created with the wrong owner, then you can delete
the file or move the file to another directory outside of the Management
Agent directory. Depending on your installation, you can find the Management
Agent directory at one of the following locations:
For the Standalone Management Agent:
/opt/oracle/mgmt_agent/
For the Management Agent plug-in for an Oracle Cloud
Agent in an OCI Compute Instance:
/var/lib/oracle-cloud-agent
Note
To avoid these
issues, do not manually create any files in the Management Agent
directory.
Troubleshoot Management Agents on Compute
Instances 🔗
Users may encounter various errors during the deployment of Oracle
Management Agent on compute instances. Causes and recommended actions for some common errors
are listed below.
Troubleshoot: Agent is in Not
Available state and agent log file reports "Invalid tags"
The Management Agents page shows the Agent in 'Not available'
state and the mgmt_agent.log file (located under
<Agent_Inst>/logs directory) reports the following
message:
ErrorBody:{"code" :
"InvalidParameter","message" : "Invalid tags: Resource
creation failed because the resource requires tag value(s). Add a value to
the each of the following tag definition(s): \nGLOBAL.ComponentType,
GLOBAL.ApplicationName,
Cause:
This issue can happen when the compartment requires mandatory tags for
every resource and the resource creation request does not include the tags, then the
activation request would fail with the message:"Invalid tags: Resource
creation failed because the resource requires tag value(s)" and the
agent status is shown as 'Not Available'.
Action:
Management Agents
If you have a standalone Management Agent, it must be
uninstalled.
If the Management Agent was installed using
an RPM or a ZIP file, it must be uninstalled and reinstalled by providing a
response file using the DefinedTags parameter as described
in the Review Agent Parameters section.
Management Agents on Compute Instances
If the
Management Agent is enabled through the OCI Console using the OCA plugin,
then there is no response file since it's not used for compute instances. In
this case, do the following:
Log in to the instance where the Management Agent is
deployed and sudo as oracle-cloud-agent user using
the following command:
sudo -u oracle-cloud-agent sh
Create an agent.definedtags file in the
following
location:/var/lib/oracle-cloud-agent/plugins/oci-managementagent/polaris/agent_inst/config/security/resource/
Add defined tags needed for the resource to be created
in agent.definedtags file.
For
example, if there are 2 namespaces
admin_namespace and
finance_namespace
and
each namespace uses 2 keys and 2 values
environment_type=non-prod,
sensitivity=restricted, then you can use
the
following:
Troubleshoot: Management Agent setup failed
with fork/exec oracle.polaris.oca.main: permission denied
Users may encounter this error resulting in failure to install or start the Management Agent.
The error message shown in the Plugin view of compute instance for the Management Agent Plugin looks similar to the following:
workflow.go:23: [ERROR] step [*core.SetupImageStep] execution failed with [setup image failed with [fork/exec 230821.1905/bin/oracle.polaris.oca.main: permission denied]]
mgmtagent_image.go:139: [ERROR] bootstrap workflow failed with error setup image failed with [fork/exec 230821.1905/bin/oracle.polaris.oca.main: permission denied]
agent.go:74: [ERROR] failed to start agent during bootstrap with [setup image failed with [fork/exec 230821.1905/bin/oracle.polaris.oca.main: permission denied]]
Possible Cause:
This issue may happen when a compute instance disallows fork/execute operations from the /tmp directory by mounting the tmpfs with the noexec flag.
To confirm this possible cause, run the following:
$ mount | grep tmpfs
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,noexec,inode64)
The output should say does have the noexec flag.
Action:
Stop Oracle Cloud Agent.
sudo systemctl stop oracle-cloud-agent
Add the following setting to the file: /etc/oracle-cloud-agent/plugins/oci-managementagent/config.yml
overrideTmpDir: true
Start Oracle Cloud Agent.
$ sudo systemctl start oracle-cloud-agent
Troubleshoot: Management Agent
authentication failure due to clock skew, a different time on the compute
instance compared to the time on the server
Cause: If there is a clock skew of more than 5 minutes between the
Compute Instance where the agent is running and Oracle Cloud Infrastructure Identity
service, then the requests will be rejected with a HTTP 401.
If you find the
following errors:
On the OCI Compute Instance, go to the Oracle Cloud
Agent tab, the Management Agent displays an error in the Message
column:
Fix the clock skew and restart. If the
agent has been down for days because of this error, then you must clean up
thedonotrestart file before restarting the agent.
Additionally, Oracle recommends to set up the OS date time to auto-sync
with NTP servers to avoid downtime in the future. If additional services are running
on the machine it's best practice to restart the machine after the time change so
the services can reset with the new time.
To correct the OS date
time where the agent is running and then restart the agent you can follow these
steps:
To stop the agent run the following
command:
sudo systemctl stop oracle-cloud-agent
Correct the date and time.
Run the following command to delete the
configure.donotrestart file.
Troubleshoot: OCI Management Agent Service: Agent Not Visible In OCI Console Under Observability & Management
OCI Management Agent installed successfully on a Compute Instance. The Agent is
running on the host. However, the Agent is not appearing in the Oracle Cloud Console
if you go to the Navigation menu, select Observability & Management, go to
Management Agents and then select Agents.
Possible Cause: The compartments of the Compute Instance and
Agent Install Key are different.
Action:
Stop and uninstall the Management Agent on the Compute
Instance.
Cause: In some cases, it may be necessary to remove an existing
Management Gateway installation, in order to reinstall it.
Action:
Check if the gateway is running:
For OL7: systemctl status mgmt_gateway
For OL6: /sbin/initctl status
mgmt_gateway
If the gateway is running, stop it:
For OL7: systemctl stop mgmt_gateway
For OL6: /sbin/initctl stop
mgmt_gateway
Remove the installed Gateway RPM using the following
command: rpm -e oracle.mgmt_gateway --noscripts
Remove any remaining Gateway files using the following
command:
rm -rf /opt/oracle/mgmt_agent
Run the following:
For OL7: rm -rf
/etc/systemd/system/mgmt_gateway.service
For OL6: rm -rf
/etc/init/mgmt_agent.conf
Troubleshoot: Configure
Management Gateway
Cause: In some cases, the hostname might not be resolved in the
installation environment which might cause the installation to fail with the
following error message:
Troubleshoot: "Could not resolve hostname <hostname value> in
the installation environment. Resolve the hostname or provide the
GatewayCertCommonName in the response file and rerun the gateway setup
script."
Action:
Check and resolve the hostname of the environment to get the fully
qualified doamin name (FQDN) value after running the command: hostname
-f
Optionally a user can provide a custom fully qualified domain name
for the gateway configuration via seeding the
GatewayCertCommonName property in input response file. See
Response File
Parmaters
Cause: In some cases, the Management Gateway installation might
fail with the following error message due to the absence of policies in OCI or
because of resource limit issues in the tenancy. If you see the following error,
follow the steps below.
Troubleshoot: "Failed to start Management Gateway as certificates
could not be created, initialized or retrieved in OCI. Please check the logs
for more details."
Action:
Open the log file in the Management Gateway installation directory,
for example:
/opt/oracle/mgmt_agent/plugins/GatewayProxy/statedir/log/mgmt_gateway.log
If the log file contains any of the following 404 error codes, then
choose one of the following options to resolve the issue:
2023-07-25 15:38:06.694/CEST [pool-3-thread-1] INFO com.oracle.mgmtagent.proxy.oci.certificate.util.CertificateUtility - Response String { "code" : "NotAuthorizedOrNotFound", "message" : "Authorization failed or requested resource not found."}
2023-07-25 15:38:06.696/CEST [pool-3-thread-1] ERROR com.oracle.mgmtagent.proxy.ProxyServer - Error while initializing and loading certificate bundlescom.oracle.mgmtagent.proxy.exception.CertificateFailureException: The response status is 404 after multiple retries at com.oracle.mgmtagent.proxy.oci.certificate.util.CertificateUtility.executeRequest(CertificateUtility.java:293) ~
Manually add and confirm the correct dynamic groups and
policies required for installing the Management Gateway are added to
the specific compartment within the tenancy where you want to
install the Management Gateway. For more information, see Perform Prerequisites for Deploying Management Gateway.
If the log file contains any of the following 400 error codes, then
review the following options to resolve the issue:
2023-09-20 18:51:32.772/GMT [pool-3-thread-1] INFO com.oracle.mgmtagent.proxy.oci.certificate.util.CertificateCreationUtil - Create Vault Service Url invoked https://kms.us-ashburn-1.oraclecloud.com/20180608/vaults
2023-09-20 18:51:33.400/GMT [pool-3-thread-1] INFO com.oracle.mgmtagent.proxy.oci.certificate.util.CertificateUtility - Received response code 400
2023-09-20 18:51:33.400/GMT [pool-3-thread-1] INFO com.oracle.mgmtagent.proxy.oci.certificate.util.CertificateUtility - Header name opc-request-id , value /5704D03441842D3818B824B2D6B2712E/1D1FED893474FDA900188E24F3DEE59B
2023-09-20 18:51:33.401/GMT [pool-3-thread-1] INFO com.oracle.mgmtagent.proxy.oci.certificate.util.CertificateUtility - Response String { "code" : "LimitExceeded", "message" : "The limit for this tenancy has been exceeded."}
Check the limit for the Default Vault Count
resource for the Key Management Service in OCI console. You can
raise a request to increase the resource limits. For more
information, see Managing Keys
and Managing
Vaults.
When you create
the Issued by internal CA certificates, the Certificate
Profile must be either TLS Server or TLS Client
and only the RSA signing algorithms are supported.
If there are any other failures related to the Vault or the Key
service API's in the logs, then you can raise a request and reach out to the
oci_kms team by providing the response body and
opc-request-id.
If there are any other failures related to Certificate Authorities
or Certificate service API's in the logs, then raise a request and reach out to
oci_certificates team by providing the response body and
opc-request-id.
Troubleshoot: Management Gateway installation fails on Red Hat
Enterprise Linux 9.x
The Management Gateway installation
fails and the following error message may display: mgmt_gateway service creation
failed. Reason: Detected Linux.
Additionally, the install
failure log messages may confirm the error and indicate the set up attempts use an
incorrect service manager to install the gateway.
Cause:
Red Hat removed the chkconfig package in the Red Hat Enterprise
Linux (RHEL) 9 distribution, for more details see the Red Hat Knowledge base.
Action:
Verify the Issue
Confirm the environment uses Red Hat Enterprise Linux 9.x by
running the following
command:
$ cat /etc/redhat-release
Red Hat Enterprise Linux release 9.3 (Plow)
The messages below highlight the problem that the OS/family was
not identified correctly using the rules present in agentcore script and the
install will attempt to set up agent service using init.d
and not systemctl on RHEL 9x.
$ rpm -ivh oracle.mgmt_gateway.231118.1208.1702955171.Linux-x86_64.rpm
Verifying... ################################# [100%]
Preparing... ################################# [100%]
Checking pre-requisites
Checking if any previous gateway service exists
Checking if OS has systemd or initd
Checking available disk space for gateway install
Checking if /opt/oracle/mgmt_agent directory exists
Checking if 'mgmt_agent' user exists
'mgmt_agent' user already exists, the gateway will proceed installation without creating a new one.
Checking Java version
Trying /omc/java/jdk1.8.0_391
Java version: 1.8.0_391 found at /omc/java/jdk1.8.0_391/bin/java
Checking agent version
Updating / installing...
1:oracle.mgmt_gateway-231118.1208.1################################# [100%]
Executing install
Unpacking software zip
Copying files to destination dir (/opt/oracle/mgmt_agent)
Initializing software from template
Checking if JavaScript engine is available to use
Creating 'mgmt_gateway' daemon
mgmt_gateway service creation failed. Reason: Detected Linux:
Installing the mgmt_gateway daemon...
ln: failed to create symbolic link '/etc/init.d/mgmt_gateway': No such file or directory
ln: failed to create symbolic link '/etc/rc3.d/K20mgmt_gateway': No such file or directory
ln: failed to create symbolic link '/etc/rc3.d/S20mgmt_gateway': No such file or directory
ln: failed to create symbolic link '/etc/rc5.d/S20mgmt_gateway': No such file or directory
ln: failed to create symbolic link '/etc/rc5.d/K20mgmt_gateway': No such file or directory
Service not installed.
warning: %post(oracle.mgmt_gateway-231118.1208.1702955171-1.x86_64) scriptlet failed, exit status 1
Verify the chkconfig package is missing
as described in the following article on the Red Hat Knowledge base.
Solution 1 - Install the
chkconfig package
Install the missing package by executing the following
command:
$ dnf install chkconfig
Validate the package exists in the environment by executing the
following
command:
$ rpm -qa | grep chkconfig
Install the Management Gateway again.
Solution 2 - Without Installing the chkconfig package
Note
This is a workaround, only use this
solution if the chkconfig package can not be installed. The
recommended solution is to install the chkconfig package.
If installing the chkconfig package is not an option as
described in the Solution 1 section above, then complete the following steps as an
alternative solution to install the Management Gateway software.
Use the following commands to:
Switch to a root shell.
Set the environment variable
DIST_LINUX_FAMILY_OVERRIDE="Red Hat".
Troubleshoot: Management Gateway Installation Fails
With Error: Certificates could not be created and the Identity logs report:
Authentication failed: DATE_OUTSIDE_CLOCK_SKEW
# /opt/oracle/mgmt_agent/agent_inst/bin/setupGateway.sh opts=<PATH>/gateway_agent.rsp/opt/oracle/mgmt_agent/agent_inst/bin/setupAgent.sh
opts=<PATH>/gateway_agent.rsp
Executing configure
Parsing input response file
Validating install key
Generating communication wallet
Generating security artifactsRegistering Management Gateway
Found service plugin(s):[GatewayProxy]
Starting gateway...
Gateway started successfully
Starting plugin deployment for: [GatewayProxy]
Deploying service plugin(s)...Done.
GatewayProxy : Successfully deployed external plugin
Gateway setup completed and the gateway is running.
In the future gateway can be started by directly running: sudo systemctl start mgmt_gateway
Please make sure that you delete <PATH>/gateway_agent.rsp or store it in secure location.
Creating gateway system properties file
Creating properties fileCreating or validating certificates
Waiting for Management Gateway to create or validate certificates...
Waiting for Management Gateway to create or validate certificates...
Waiting for Management Gateway to create or validate certificates...
Waiting for Management Gateway to create or validate certificates...
Waiting for Management Gateway to create or validate certificates...
Waiting for Management Gateway to create or validate certificates...
Waiting for Management Gateway to create or validate certificates...
Failed to start Management Gateway as certificates could not be created, initialized or retrieved in OCI. Please check the logs for more details.
Management Gateway stopped
Action:
On the host where Management Gateway is installed, ensure the host time is correct
and then install Management Gateway.
Troubleshoot: When
installing or configuring Management Gateway, Timed Out
Error
If you verify the OCI Console displays the Management Gateway as active,
but the metrics are not populating.
You may find the following error if the service times
out:
/opt/oracle/mgmt_agent/agent_inst/bin/setupGateway.sh opts=<user_home_directory>/gateway.rsp
Starting gateway...
Gateway started successfully
Starting plugin deployment for: [GatewayProxy]
Deploying service plugin(s)...............Timed out.
Agent is unable to check if it deployed requested service plugin(s) successfully or not.
Please check back later on the console.
Cause: A longer than expected time to complete the Management Gateway setup
task may result in a network communication issue and may cause the Management
Gateway to time out.
Action: To complete the setup:
Confirm there are no network communication issues.
Verify if the following proxy details were updated in the
response file to determine if any proxy issue exists. For example, confirm
the correct proxy host and port details were updated in the response file:
ProxyHost = my.proxyhost.com
ProxyPort = 80
Stop the Management Gateway using the following command:
systemctl stop mgmt_gateway
Re-run the Management Gateway setup using the following
command: