Workload Isolation
Table of Contents
Overview
When running Slinky in certain environments, it may be necessary to isolate the nodes running Slurm NodeSets from other Kubernetes workloads. Typically, this should only be necessary for the slurmd NodeSets. This document provides an example of how this can be done using taints and tolerations.
Pre-requisites
This guide assumes that the user has access to a functional Kubernetes cluster
running slurm-operator. See the quickstart guide for details on setting up
slurm-operator on a Kubernetes cluster.
Taints and Tolerations
Taints are a mechanism that Kubernetes provides that allows a node to repel a set of pods that lack a matching toleration. Tolerations are the mechanism that Kubernetes provides that allow the scheduler to schedule pods on nodes with matching taints.
Apply a taint to the nodes that will only run Slurm pods:
kubectl taint nodes kind-worker2 slinky.slurm.net/slurm:NoExecute
kubectl taint nodes kind-worker3 slinky.slurm.net/slurm:NoExecute
kubectl taint nodes kind-worker4 slinky.slurm.net/slurm:NoExecute
kubectl taint nodes kind-worker5 slinky.slurm.net/slurm:NoExecute
Confirm that the taint was applied:
kubectl get nodes -o jsonpath="{range .items[*]}{.metadata.name}:{' '}{range .spec.taints[*]}{.key}={.value}:{.effect},{' '}{end}{'\n'}{end}"
kind-control-plane: node-role.kubernetes.io/control-plane=:NoSchedule,
kind-worker:
kind-worker2: slinky.slurm.net/slurm=:NoExecute
kind-worker3: slinky.slurm.net/slurm=:NoExecute
kind-worker4: slinky.slurm.net/slurm=:NoExecute
kind-worker5: slinky.slurm.net/slurm=:NoExecute
Next, configure the tolerations on the slurm-operator components. Each of the
components of slurm-operator can have their tolerations set from within
values.yaml. Update the tolerations section of all components to match the
taint that you applied in step 1. This will need to be done for all components
in both the slurm and slurm-operator Helm charts.
# -- Tolerations for pod assignment.
# Ref: https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/
tolerations:
- key: slinky.slurm.net/slurm
operator: Exists
effect: NoSchedule
Pod Anti-Affinity
In some cases anti-affinity must be configured in order to prevent multiple
NodeSet pods (slurmd) from being scheduled on the same node. Pod anti-affinity
can be configured under the affinity section of a NodeSet. To ensure that
multiple NodeSet pods cannot be scheduled on the same node, add the following to
the affinity section:
nodesets:
slinky:
...
# -- Affinity for pod assignment.
# Ref: https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#affinity-and-anti-affinity
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- topologyKey: kubernetes.io/hostname
labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- slurmctld
- slurmdbd
- slurmrestd
- mariadb
- slurmd
After applying the Helm chart with affinity set in values.yaml, the
affinity section can be observed in the NodeSet by running:
kubectl describe NodeSet --namespace slurm