These topics cover some common issues you might run into and how to address them.
Patching Failures on Exadata Database Service on Cloud@Customer Systems
Patching operations can fail for various reasons. Typically, an operation fails because a database node is down, there is insufficient space on the file system, or the virtual machine cannot access the object store.
Determining the Problem
In the Console, you can identify a failed patching operation by viewing the patch history of an Exadata Database Service on Cloud@Customer system or an individual database.
A patch that was not successfully applied displays a status of
Failed and includes a brief description of the
error that caused the failure. If the error message does not contain enough
information to point you to a solution, you can use the database CLI and log
files to gather more data. Then, refer to the applicable section in this
topic for a solution.
Troubleshooting and Diagnosis
Diagnose the most common issues that can occur during the patching process of any of the Exadata Database Service on Cloud@Customer components.
Database Server VM Issues
One or more of the following conditions on the database server VM can cause patching operations to fail.
Database Server VM Connectivity Problems
Cloud tooling relies on the proper networking and connectivity configuration between virtual machines of a given VM cluster. If the configuration is not set properly, this may incur in failures on all the operations that require cross-node processing. One example can be not being able to download the required files to apply a given patch.
Given the case, you can perform the following actions:
- Verify that your DNS configuration is correct so that the relevant virtual machine addresses are resolvable within the VM cluster.
- Refer to the relevant Cloud Tooling logs as instructed in the Obtaining Further Assistance section and contact Oracle Support for further assistance.
Oracle Grid Infrastructure Issues
One or more of the following conditions on Oracle Grid Infrastructure can cause patching operations to fail.
Oracle Grid Infrastructure is Down
Oracle Clusterware enables servers to communicate with each other so that they can function as a collective unit. The cluster software program must be up and running on the VM Cluster for patching operations to complete. Occasionally you might need to restart the Oracle Clusterware to resolve a patching failure.
./crsctl check cluster CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online
crsctl start cluster -all
crsctl check cluster
Oracle Databases Issues
An improper database state can lead to patching failures.
Oracle Database is Down
The database must be active and running on all the active nodes so the patching operations can be completed successfully across the cluster.
srvctl status database -d db_unique_name -verbose
The system returns a message including the database instance status. The instance status must be Open for the patching operation to succeed.
srvctl start database -d db_unique_name -o open
Obtaining Further Assistance
If you were unable to resolve the problem using the information in this topic, follow the procedures below to collect relevant database and diagnostic information. After you have collected this information, contact Oracle Support.
Collecting Cloud Tooling Logs
Use the relevant log files that could assist Oracle Support for further investigation and resolution of a given issue.
Collecting Oracle Diagnostics
To collect the relevant Oracle diagnostic information and logs, run the
diag collect command.
For more information about the usage of this utility, see DBAAS Tooling: Using dbaascli to Collect Cloud Tooling Logs and Perform a Cloud Tooling Health Check.
VM Operating System Update Hangs During Database Connection Drain
Description: This is an intermittent issue. During virtual machine
operating system update with 19c Grid Infrastructure and running databases,
dbnodeupdate.sh waits for
RHPhelper to drain the
connections, which will not progress because of a known bug "DBNODEUPDATE.SH HANGS IN
RHPHELPER TO DRAIN SESSIONS AND SHUTDOWN INSTANCE".
- VM operating system update hangs in
- Hangs the automation
- Some or none of the database connections will have drained, and some or all of the database instances will remain running.
- VM operating system update does not drain database connections because
- Does not hang automation
- Some or none of the database connection draining completes
/var/log/cellos/dbnodeupdate.trc will show this as the
(ACTION:) Executing RHPhelper to drain sessions and shutdown instances. (trace:/u01/app/grid/crsdata/scaqak04dv0201/rhp//executeRHPDrain.150721125206.trc)
- Upgrade Grid Infrastructure version to 19.11 or above.
rhphelperbefore updating and enable it back after updating.To disable before updating is started:
/u01/app/188.8.131.52/grid/srvm/admin/rhphelper /u01/app/184.108.40.206/grid 220.127.116.11.0 -setDrainAttributes ENABLE=falseTo enable after updating is completed:
/u01/app/18.104.22.168/grid/srvm/admin/rhphelper /u01/app/22.214.171.124/grid oracle-home-current-version -setDrainAttributes ENABLE=true
If you disable
rhphelper, then there will be no database connection draining before database services and instances are shutdown on a node before the operating system is updated.
- If you missed disabling RHPhelper and upgrade is not progressing and hung, then
it is observed that the draining of services is taking time:
- Inspect the
/var/log/cellos/dbnodeupdate.trctrace file, which contains a paragraph similar to the following:
(ACTION:) Executing RHPhelper to drain sessions and shutdown instances. (trace: /u01/app/grid/crsdata/<nodename>/rhp//executeRHPDrain.150721125206.trc)
- Open the
rhphelperfails, then the trace file contains the message as follows:
"Failed execution of RHPhelper"If
rhphelperhangs, then the trace file contains the message as follows:
(ACTION:) Executing RHPhelper to drain sessions and shutdown instances.
- Identify the
rhphelperprocesses running at the operating system level and kill them.
There are two commands that will have the string “rhphelper” in the name – a Bash shell, and the underlying Java program, which is really
root, so must be killed as
[opc@<HOST> ~] pgrep –lf rhphelper 191032 rhphelper 191038 java
[opc@<HOST> ~] sudo kill –KILL 191032 191038
- Verify that the
dbnodeupdate.trcfile moves and the Grid Infrastructure stack on the node is shutdown.
For more information about RHPhelper, see Using RHPhelper to Minimize Downtime During Planned Maintenance on Exadata (Doc ID 2385790.1).
- Inspect the
Adding a VM to a VM Cluster Fails
[FATAL] [INS-32156] Installer has detected that there are non-readable files in oracle home. CAUSE: Following files are non-readable, due to insufficient permission oracle.ahf/data/scaqak03dv0104/diag/tfa/tfactl/user_root/tfa_client.trc ACTION: Ensure the above files are readable by grid.
Cause: Installer has detected a non-readable trace file,
created by Autonomous Health Framework (AHF) in Oracle home that causes adding a cluster
VM to fail.
AHF ran as
root created a
trc file with
root ownership, which the
grid user is not able to
griduser before you add VMs to a VM cluster. To fix the permission issue, run the following commands as
rooton all the existing VM cluster VMs:
chown grid:oinstall /u01/app/126.96.36.199/grid/srvm/admin/logging.properties
chown -R grid:oinstall /u01/app/188.8.131.52/grid/oracle.ahf*
chown -R grid:oinstall /u01/app/grid/oracle.ahf*
Nodelist is not Updated for Data Guard-Enabled Databases
Description: Adding a VM to a VM cluster completes successfully,
however, for Data Guard-enabled databases, the new VM is not added to the nodelist in
Cause: Data Guard-enabled databases will not be extended to the newly
added VM. And therefore, the
<db>.ini file will also not be
updated because the database instance is not configured in the new VM.
Action: To add an instance to primary and standby databases and to the new VMs (Non-Data Guard), and to remove an instance from a Data Guard environment, see My Oracle Support note 2811352.1.
CPU Offline Scaling Fails
** CPU Scale Update **An error occurred during module execution. Please refer to the log file for more information
Cause: After provisioning a VM cluster, the
/var/opt/oracle/cprops/cprops.ini file, which is automatically
generated by the database as a service (DBaaS) is not updated with the
common_dcs_agent_port parameters and this causes CPU offline
scaling to fail.
rootuser, manually add the following entries in the
common_dcs_agent_portvalue is 7070 always.
netstat -tunlp | grep 7070
netstat -tunlp | grep 7070 tcp 0 0 <IP address 1>:7070 0.0.0.0:* LISTEN 42092/java tcp 0 0 <IP address 2>:7070 0.0.0.0:* LISTEN 42092/java
You can specify either of the two IP addresses, <IP
address 1> or <IP address 2> for