Topology

Table of Contents

Overview

The operator can propagate topology from Kubernetes node to Slurm nodes (those running as NodeSet pods). When a NodeSet pod is running on a Kubernetes node, its annotations are used by the operator to update the registered Slurm node’s topology. A topology file is required for dynamic topology to work.

If there is a misconfiguration of topology.yaml or the Kubernetes node annotation, an error will be reported in the operator logs.

Kubernetes

Each Kubernetes node should be annotated with topology.slinky.slurm.net/spec. its value is transparently used by the operator to update Slurm node topology information, only if running as a NodeSet pod.

For example, the following Kubernetes Node snippet has the Slinky topology annotation applied.

apiVersion: v1
kind: Node
metadata:
  name: node0
  annotations:
    topology.slinky.slurm.net/spec: topo-switch:s0,topo-block:b0

Slurm

Slurm supports topology.yaml, a YAML based configuration file capable of expressing one or more topology configurations in the same Slurm cluster.

Please review the Slurm topology guide.

Example

For example, your Slurm cluster has the following topology.yaml.

---
- topology: topo-switch
  cluster_default: true
  tree:
    switches:
      - switch: sw_root
        children: s[1-2]
      - switch: s1
        nodes: node[1-2]
      - switch: s2
        nodes: node[3-4]
- topology: topo-block
  cluster_default: false
  block:
    block_sizes:
      - 2
      - 4
    blocks:
      - block: b1
        nodes: node[1-2]
      - block: b2
        nodes: node[3-4]
- topology: topo-flat
  cluster_default: false
  flat: true

And your Kubernetes nodes were annotated as such to match the topology.yaml.

---
apiVersion: v1
kind: Node
metadata:
  name: node1
  annotations:
    topology.slinky.slurm.net/spec: topo-switch:s1,topo-block:b1
---
apiVersion: v1
kind: Node
metadata:
  name: node2
  annotations:
    topology.slinky.slurm.net/spec: topo-switch:s1,topo-block:b1
---
apiVersion: v1
kind: Node
metadata:
  name: node3
  annotations:
    topology.slinky.slurm.net/spec: topo-switch:s2,topo-block:b2
---
apiVersion: v1
kind: Node
metadata:
  name: node4
  annotations:
    topology.slinky.slurm.net/spec: topo-switch:s2,topo-block:b2

Then when the slinky-0 NodeSet pod is scheduled onto Kubernetes node node3, the operator will update the Slurm node’s topology to match that of topology.slinky.slurm.net/spec. Hence Slurm will report the following after the Slurm node’s topology was updated.

$ scontrol show nodes slinky-0 | grep -Eo "NodeName=[^ ]+|[ ]*Comment=[^ ]+|[ ]*Topology=[^ ]+"
NodeName=slinky-0
   Comment={"namespace":"slurm","podName":"slurm-worker-slinky-0","node":"node3"}
   Topology=topo-switch:s2,topo-block:b2