Cluster Management Best Practices

Find out about best practices for managing clusters you've created by Container Engine for Kubernetes (OKE).

This section contains best practices for cluster management and Container Engine for Kubernetes.

Best Practice: Use Kubernetes labels

We recommend that you use Kubernetes labels to organize the many Kubernetes resources (such as services, pods, containers, networks) that comprise a cluster.

Kubernetes labels are key-value pairs that help you to maintain these resources and keep track of how they interact with each other in a cluster.

For example, you can use the oci.oraclecloud.com/oke-is-preemptible=true label (which Container Engine for Kubernetes applies to worker nodes hosted on preemptible instances) with Kubernetes node selectors and node affinity/anti-affinity to control which pods are scheduled on those worker nodes.

See Well-Known Labels, Annotations and Taints in the Kubernetes documentation.

Best Practice: Use OCI resource tagging

We recommend that you use OCI resource tagging to organize the many resources (such as worker nodes, VCNs, load balancers, and block volumes) used by the Kubernetes clusters you create with Container Engine for Kubernetes. When there are a large number of resources spread across multiple compartments in a tenancy, you can find it difficult to track the resources used for specific purposes. Equally, you can find it difficult to aggregate the resources, report on them, and take bulk actions on them.

Tagging enables you to define keys and values, and associate them with resources. You can then use the tags to organize and list resources based on your business needs.

See Tagging Kubernetes Cluster-Related Resources.

Best Practice: Set resource requests and limits

We recommend that you set:

  • resource requests, to specify the minimum amount of resources a container can use
  • resource limits, to specify the maximum amount of resources a container can use

When working with a Kubernetes cluster, a common challenge is the occasional failure of an application to deploy on a cluster due to limited availability of resources on that cluster. The failure is caused by resource requests and resource limits not having been set.

If you do not set resource requests and limits, pods in a cluster can start utilizing more resources than necessary. If a pod starts consuming more CPU or memory on a node, then the kube-scheduler might not be able to place new pods on the node, and the node itself might even crash.

See Requests and limits in the Kubernetes documentation.

Best Practice: Reserve resources for Kubernetes and OS system daemons

We recommend that you use the --kube-reserved and --system-reserved kubelet flags to reserve CPU and memory resources for Kubernetes system daemons (such as kubelet and container runtime) and OS system daemons (such as sshd and systemd) respectively. For example:

  • --kube-reserved=cpu=500m,memory=1Gi
  • --system-reserved=cpu=100m,memory=100Mi

Pods running on a worker node can consume all available CPU and memory resources, and so prevent other essential processes (such as the Kubernetes and OS system daemons) from running on the node. When Kubernetes and OS system daemons cannot run, the worker node can become unresponsive, unstable, and unexpectedly crash under heavy load.

To prevent pods requesting resources that are required by the Kubernetes and OS system daemons, include the --kube-reserved and --system-reserved kubelet flags as kubelet-extra-args options in a custom cloud-init script. For more information and an example, see Example 4: Using a Custom Cloud-init Script to Reserve Resources for Kubernetes and OS System Daemons.

When using the --kube-reserved kubelet flag to reserve a portion of a worker node's CPU and memory resources for use by Kubernetes system daemons, consider the following recommendations:

  • The amount of CPU resource that we recommend you reserve for Kubernetes system daemons depends on the number of CPU cores on the worker node, as shown in the following table:
    Number of CPU cores on worker node 1 2 3 4 5 More than 5
    Recommended CPU to reserve, in millicore (m) 60 m 70 m 80 m 85 m 90 m An additional 2.5 m for every additional core on worker node
  • The amount of memory resource that we recommend you reserve for Kubernetes system daemons depends on the amount of memory on the worker node, as shown in the following table:
    Memory on worker node, in GiB 4 GiB 8 GiB 16 GiB 128 GiB More than 128 GiB
    Recommended memory to reserve, in GiB 1 GiB 1 GiB 2 GiB 9 GiB An additional 20 MiB for every additional GiB of worker node memory

When using the --system-reserved kubelet flag to reserve a portion of a node's CPU and memory resources for use by OS system daemons, consider the following recommendations:

  • The amount of CPU resource that we recommend you reserve for OS system daemons (regardless of node shape) is 100 m (millicore).
  • The amount of memory resource that we recommend you reserve for OS system daemons (regardless of node shape) is 100 Mi (mebibytes).

Note that our CPU and memory recommendations for the --kube-reserved and --system-reserved kubelet flags might not be optimal for the workloads you intend to run, so you might need to alter the values accordingly. You might also need to adjust the values over time.

Additional recommendations:

  • Before applying reservation changes to production clusters, always benchmark and test the impact of the reservation changes in a non-production environment.
  • Use the --eviction-hard or --eviction-soft kubelet flags to set appropriate thresholds for memory and disk pressure. When you set these thresholds, the Kubernetes system can protect system stability by evicting less important pods when necessary. For more information, see Node-pressure Eviction in the Kubernetes documentation.
  • Be aware that reserving too many resources can lead to the under-utilization of nodes. Your goal is to find an appropriate balance between guaranteeing resource availability for critical components, and maximizing resource availability for workloads. We recommend that you start with larger resource reservations and gradually reduce reservation sizes based on observation, rather than starting with smaller resource reservations that are too low and run the risk of system instability. Use metrics from monitoring and alerting tools to observe the usage of resources by Kubernetes and system components over time.
  • When reserving resources, take account of differences in node shape and workload type. Large nodes might require larger absolute reservations than smaller nodes. Workloads with specific resource needs or known burst patterns might require larger or smaller resource reservations.

For more information about reserving resources, see Reserve Compute Resources for System Daemons in the Kubernetes documentation.

Best Practice: Provide dedicated nodes using taints and tolerations

We recommend that you use Kubernetes taints and tolerations to limit resource-intensive applications to specific worker nodes.

Using taints and tolerations enables you to keep node resources available for workloads that require them, and prevents the scheduling of other workloads on the nodes.

For example, when you create a cluster using Container Engine for Kubernetes, you can define worker nodes to have a GPU shape, or a shape with a large number of powerful CPUs. These well-specified worker nodes are ideal for large data processing workloads. However, such specialized hardware is normally expensive to deploy. Consequently, you'll typically want to limit the workloads that can be scheduled on these nodes. To limit the workloads that can be scheduled on the well-specified worker nodes, add a taint to the nodes. For example, by running one of the following commands:

  • kubectl taint nodes <node-name> special=true:NoSchedule
  • kubectl taint nodes <node-name> special=true:PreferNoSchedule

Having added a taint to the well-specified worker nodes, add a corresponding toleration to the pods that you want to allow to use the nodes.

Similarly, you can use the oci.oraclecloud.com/oke-is-preemptible=true label (which Container Engine for Kubernetes applies to worker nodes hosted on preemptible instances) with Kubernetes tolerations to control which pods are scheduled on those worker nodes.

See Taints and Tolerations in the Kubernetes documentation.

Best Practice: Control pod scheduling using node selectors and affinity

There are several ways to constrain a pod to run on particular node(s), or to specify a preference for a pod to run on particular node(s). The recommended approaches all use label selectors to facilitate the selection. Often, the kube-scheduler will automatically do a reasonable placement without such constraints or preferences. However, there are some circumstances where you might want to control the node on which a pod runs.

In these situations, we recommend that you control the scheduling of pods on nodes using Kubernetes node selectors, node affinity, and inter-pod affinity.

Using node selectors, node affinity, and inter-pod affinity enables the kube-scheduler to logically isolate workloads, such as by the node's hardware.

For example, you might give nodes a label to indicate that they have locally attached SSD storage. To specify a pod is to run only on nodes with locally attached SSD storage, you then include that label as a node selector in the pod specification. Kubernetes only schedules the pods on nodes with matching labels.

See Assigning Pods to Nodes in the Kubernetes documentation.

Best Practice: Use third-party tools for backup and disaster recovery

We recommend that you use third-party tools (such as Kasten, Rancher, Trilio, or Velero) with Container Engine for Kubernetes for backup and disaster recovery.

The combined backup and disaster recovery capabilities of these tools and Container Engine for Kubernetes can provide a reliable, robust, and scalable Kubernetes platform that's production-ready.

See A simple guide to Kubernetes disaster recovery across regions with Kasten by Veeam for example.