Running Applications on GPU Nodes

Find out how to run applications on GPU worker nodes in clusters created using Container Engine for Kubernetes (OKE).

To run an application on a GPU worker node, you select a managed node pool, a GPU shape, and a GPU image.

When you use Container Engine for Kubernetes to create clusters, you select a shape for the nodes in each node pool. The shape determines the number of CPUs and the amount of memory allocated to each node in the node pool. Among the shapes you can select are GPU (Graphics Processing Unit) shapes, with the GPUs themselves on NVIDIA graphics cards. Originally intended for manipulating images and graphics, GPUs are very efficient at processing large blocks of data in parallel. This capability makes GPUs a good option when deploying data intensive applications.

The massive parallel computing functionality of NVIDIA GPUs is accessed using CUDA (Compute Unified Device Architecture) libraries. Different GPUs (for example, NVIDIA® Tesla Volta™, NVIDIA® Tesla Pascal™) require specific versions of the CUDA libraries.

When you select a GPU shape for a node pool, you must also select a compatible Oracle Linux GPU image that has the CUDA libraries pre-installed.

When you deploy an application on a cluster you've created with Container Engine for Kubernetes, you have to specify in the pod spec the number of GPU resources that are required. To deploy the application, the kube-scheduler determines which node has the necessary resources. When an application pod is to run on a node with a GPU shape, the following are mounted into the pod:

  • the requested number of GPU devices
  • the node's CUDA library

The application is effectively isolated from the different types of GPU. As a result, CUDA libraries for different GPUs do not have to be included in the application container, ensuring the container remains portable.

Note the following:

  • You can specify GPU shapes for node pools in clusters running Kubernetes version 1.19.7 or later. Do not specify a GPU shape for node pools running earlier versions of Kubernetes.
  • You can use the Console, the API, or the CLI to specify a GPU image for use on a GPU shape (the image name includes 'GPU'). You can also use the API or the CLI to specify a non-GPU image for use on a GPU shape.
  • Having created a node pool with a GPU shape, you cannot change the node pool to have a non-GPU shape. Likewise, you cannot change a node pool with a non-GPU shape to have a GPU shape.
  • GPU shapes are not necessarily available in every availability domain.
  • You can specify GPU shapes for node pools in clusters that have Kubernetes API endpoints hosted in a subnet of your VCN. Do not specify a GPU shape for node pools in a cluster if the cluster's Kubernetes API endpoint is not integrated into your VCN.
  • You can run applications on GPU worker nodes in managed node pools, but not in virtual node pools.

Defining a pod to run only on nodes that have a GPU

The following configuration file defines a pod to run on any node in the cluster that has one available GPU resource (regardless of the type of GPU):


apiVersion: v1
kind: Pod
metadata:
  name: test-with-gpu-workload
spec:
  restartPolicy: OnFailure
  containers:
    - name: cuda-vector-add
      image: k8s.gcr.io/cuda-vector-add:v0.1
      resources:
        limits:
          nvidia.com/gpu: 1

Defining a pod to run only on nodes that do not have a GPU

The following configuration file defines a pod to run only on nodes in the cluster that do not have a GPU:


apiVersion: v1
kind: Pod
metadata:
  name: test-with-non-gpu-workload
spec:
  restartPolicy: OnFailure
  containers:
    - name: test-with-non-gpu-workload
      image: "oraclelinux:8"

GPU shapes supported by Container Engine for Kubernetes

Container Engine for Kubernetes supports the following GPU shapes:

  • Virtual Machine (VM) GPU shapes supported by Container Engine for Kubernetes:
    • VM.GPU2.1
    • VM.GPU3.1
    • VM.GPU3.2
    • VM.GPU3.4
    • VM.GPU.A10.1
    • VM.GPU.A10.2
  • Bare Metal (BM) GPU shapes supported by Container Engine for Kubernetes:
    • BM.GPU2.2
    • BM.GPU3.8
    • BM.GPU4.8
    • BM.GPU.A100-v2.8
    • BM.GPU.A10.4

Note that due to service limits and compartment quotas, some of the supported GPU shapes might not be available in your particular tenancy.