Troubleshooting DevOps

Use troubleshooting information to identify and address common issues that can occur while working with DevOps service.

Deploying Applications to OKE

Apply Kubernetes manifest step fails because of various errors.

Authorization error:

The cluster might not exist or dynamic group and policy is missing to provide the pipeline resource access to other Oracle Cloud Infrastructure (OCI) resources in the compartment.

Check if relevant OCI resources exist and a dynamic group is created for the OCI DevOps deploy pipeline resource along with a policy for allowing this dynamic group to access relevant OCI resources.

For example, create a dynamic group for the deployment pipeline. You can name the dynamic group as, DeployDynamicGroup and replace compartmentOCID with the OCID of your compartment: All {resource.type = 'devopsdeploypipeline', resource.compartment.id = 'compartmentOCID'}

Create an IAM policy to allow the dynamic group to access all the resources: Allow dynamic-group DeployDynamicGroup to manage all-resources in compartment <compartment_name>

See DevOps IAM Policies.

Missing protocol field:

Following error message might be displayed: io.kubernetes.client.openapi.ApiException: class V1Status { apiVersion: v1 code: 500 details: null kind: Status message: failed to create typed patch object: .spec.template.spec.containers[name=\"helloworld\"].ports: element 0: associative list with keys has an element that omits key field \"protocol\" metadata: class V1ListMeta { _continue: null remainingItemCount: null resourceVersion: null selfLink: null } reason: null status: Failure }

Kubernetes manifest must have protocol field wherever container port is defined.

This issue is an existing bug on Kubernetes Server-side Apply for cluster version below 1.20. For more information, see issues 130 and 92332.

Add protocol field wherever container port is defined.

Socket timeout:

Error message might be, io.kubernetes.client.openapi.ApiException: java.net.SocketTimeoutException: connect timed out

This error might occur if Kubernetes public endpoint isn't reachable or connectable.

The Kubernetes API Endpoint must be a valid connectable address. For valid public IP endpoint you must check the network configuration of the cluster, see examples.

Deployment status fails because of timeout:

This might occur if the Kubernetes deployment time exceeds the progress deadline.

By default, for Kubernetes progress deadline is 600 seconds. See Progress Deadline Seconds.

If pods of K8s deployment aren't successfully rolled out within the deadline, then this error message is shown in the deployment logs. For more information, see Failed Deployment.

Check the logs for these pods on the cluster.

Vulnerability Audit Failure

VulnerabilityAudit step fails in the Managed Build stage.

Invalid pom.xml file configuration or failure in client JAR processing pom.xml files:

The Application Dependency Management (ADM) Maven client JAR fails to create the Bill of Materials (BOM) (payload for creating the VulnerabilityAudit step) because of invalid pom.xml file configuration or failure in client JAR processing pom.xml files.

  1. Fix and validate pom.xml file.
  2. Open a service request.

Timeout or failure in VulnerabilityAudit step:

The VulnerabilityAudit step is created but never reaches a successful final status because of timeout or failure in the VulnerabilityAudit step.

Open a service request for failure.

Validation error:

Validation error might occur if a wrong knowledgeBaseId or vulnerabilityCompartmentId is entered in the build specification file.

Check the values provided in the build spec file.

IAM policies not defined:

The VulnerabilityAudit step might fail if no policies are defined to allow the build pipeline to access the ADM resources.

Define policies to access the ADM resources.

Server error:

Service error might occur because of intermittent outage.

Open a service request.

Configuring Private Connection

Build run fails consistently at various steps.

Build run fails at 'Provision Private Access' step:

Error message might be, private access setup failed because of subnetId or nsgId not authenticated or not found.

This can occur if the Identity and Access Management (IAM) policies are wrong, or if the values for subnetId or nsgId given during the configuration is invalid.

One of the following resolutions can be considered depending on the cause of the error:

  1. Write correct IAM policies. For policy examples, see build policies.
  2. Verify if the subnetId and nsgId values are correct.

Build run fails consistently at 'Setup Software Environment' step:

Error message might be, unable to fetch certificate bundle from certificate service for build source <source> and caBundleId <Oracle Cloud Identifier (OCID) of ca bundle>.

This can occur if the IAM policy setup for the build pipeline to access certificate is incorrect, or you configured a wrong caBundleId on the connection resource.

One of the following resolutions can be considered depending on the cause of the error:

  1. Write correct IAM policies. For policy examples, see build policies.
  2. Verify the caBundleId and correct it in the external connection resource.

Build run fails consistently in 'Download Source' step:

If the URL is correct and if the repository server has installed a self-signed certificate, check the TLSVerifyConfiguration on the connection resource.

Another cause might be, you configured a repository server for which the SSL certificate isn't known.

Follow the given steps and configure Transport Layer Security (TLS) verification in the external connection resource:

  1. Get the Certificate Authority (CA) certificate using: echo -n | openssl s_client -connect <host>:<port> | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > ca.pem
  2. Upload the certificate to CA bundle resource.
  3. While creating external connection resource, configure TLSVerifyConfiguration by selecting the CA bundle resource created.

Build run fails consistently at 'Download Source' step:

Error message might be, unable to access the repository <repo URL>/Failed to connect to <repo ID and port number>.

This can occur if the repository URL configured in one of the buildSource in the buildStage isn't reachable from the build runner.

Check if the repository URL configured is correct and configure private access if the repository URL is a private IP.

Build run fails consistently at 'Setup Software Env' or 'Download Source' or 'any of the customer step in build spec file' step:

One of the following error messages might be displayed:
  1. Internal error. Occurs during setup_software_env step failure.
  2. Unable to access the repository. Occurs during download source failure.
  3. Network-related error in customer logs.

This might occur if you didn't set up the Virtual Cloud Network (VCN) correctly.

When you configure private access, build runner VM's outgoing traffic is controlled by the VCN. VCNs internal Domain Name System (DNS) resolution isn't supported for private access. Use IPs to communicate with services hosted in the private network. Following prerequisites must be followed for the VCN (subnet) in which you're configuring private access:

  1. You must have either a service gateway/NAT gateway in the configured VCN.
  2. Route rules must be added to provide access to all the OCI services through one of these gateways. If the source code or commands in the build specification file needs access to internet, then appropriate rules must exist before running the build. This is required as all the outgoing traffic from the build runner goes through the network.

Build run fails when running docker commands in the build specification from the Managed Build stage with private access:

This error occurs when the Managed Build stage is configured with private access and you have docker commands in the build spec file. For example, docker build fails with DNS resolution error or connection timeout error if the docker file contains commands accessing the internet. Similarly, docker run fails with DNS resolution error or connection timeout error when trying to access internet from the container.

To resolve this, use --network host in the docker commands to configure the Docker container network appropriately. For example,
docker build --network host -t hello-world:1.0
docker run --network host hello-world:1.0