Workload Isolation

Table of Contents

Overview

When running Slinky in certain environments, it may be necessary to isolate the nodes running Slurm NodeSets from other Kubernetes workloads.

By default, Slurm-operator does not taint nodes on which NodeSet pods are running. This can be configured by setting taintKubeNodes to true for specific NodeSets in the Slurm Helm chart, or by deploying a NodeSet CR with TaintKubeNodes: true. This option causes nodeset.slinky.slurm.net/worker=:NoExecute taints to be applied to any node on which a NodeSet slurmd replica is scheduled. Any pods that do not have a toleration matching this taint will be evicted by the Kubernetes controllers.

This document provides an example of how this can be done manually using taints and tolerations, to demonstrate what Slurm-operator does for NodeSet pods automatically when taintKubeNodes is set.

Pre-requisites

This guide assumes that the user has access to a functional Kubernetes cluster running slurm-operator. See the quickstart guide for details on setting up slurm-operator on a Kubernetes cluster.

Taints and Tolerations

Taints are a mechanism that Kubernetes provides that allows a node to repel a set of pods that lack a matching toleration. Tolerations are the mechanism that Kubernetes provides that allow the scheduler to schedule pods on nodes with matching taints.

Apply a taint to the nodes that will only run Slurm pods:

kubectl taint nodes kind-worker2 slinky.slurm.net/slurm:NoExecute
kubectl taint nodes kind-worker3 slinky.slurm.net/slurm:NoExecute
kubectl taint nodes kind-worker4 slinky.slurm.net/slurm:NoExecute
kubectl taint nodes kind-worker5 slinky.slurm.net/slurm:NoExecute

Confirm that the taint was applied:

kubectl get nodes -o jsonpath="{range .items[*]}{.metadata.name}:{' '}{range .spec.taints[*]}{.key}={.value}:{.effect},{' '}{end}{'\n'}{end}"

kind-control-plane: node-role.kubernetes.io/control-plane=:NoSchedule,
kind-worker:
kind-worker2: slinky.slurm.net/slurm=:NoExecute
kind-worker3: slinky.slurm.net/slurm=:NoExecute
kind-worker4: slinky.slurm.net/slurm=:NoExecute
kind-worker5: slinky.slurm.net/slurm=:NoExecute

Next, configure the tolerations on the slurm-operator components. Each of the components of slurm-operator can have their tolerations set from within values.yaml. Update the tolerations section of all components to match the taint that you applied in step 1. This will need to be done for all components in both the slurm and slurm-operator Helm charts.

  # -- Tolerations for pod assignment.
  # Ref: https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/
  tolerations:
    - key: slinky.slurm.net/slurm
      operator: Exists
      effect: NoSchedule

Pod Anti-Affinity

In some cases anti-affinity must be configured in order to prevent multiple NodeSet pods (slurmd) from being scheduled on the same node. Pod anti-affinity can be configured under the affinity section of a NodeSet. To ensure that multiple NodeSet pods cannot be scheduled on the same node, add the following to the affinity section:

nodesets:
  slinky:
    ...
    # -- Affinity for pod assignment.
    # Ref: https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#affinity-and-anti-affinity
    affinity:
      podAntiAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
        - topologyKey: kubernetes.io/hostname
          labelSelector:
            matchExpressions:
            - key: app.kubernetes.io/name
              operator: In
              values:
              - slurmctld
              - slurmdbd
              - slurmrestd
              - mariadb
              - slurmd

After applying the Helm chart with affinity set in values.yaml, the affinity section can be observed in the NodeSet by running:

kubectl describe NodeSet --namespace slurm