NodeSet Controller
Table of Contents
Overview
The nodeset controller is responsible for managing and reconciling the NodeSet CRD, which represents a set of homogeneous Slurm Nodes.
Design
This controller is responsible for managing and reconciling the NodeSet CRD. In addition to the regular responsibility of managing resources in Kubernetes via the Kubernetes API, this controller should take into consideration the state of Slurm to make certain reconciliation decisions.
Sequence Diagram
Slurm Node State Visibility
The operator projects Slurm node states onto Kubernetes pod conditions so that
external tools can observe Slurm state without querying the Slurm REST API
directly. Every condition type uses the prefix SlurmNodeState.
Base states — exactly one is active at a time:
Condition Type |
Slurm State |
|---|---|
|
Allocated |
|
Down |
|
Error |
|
Future |
|
Idle |
|
Mixed |
|
Unknown |
Flag states — combinable with base states:
Condition Type |
Slurm Flag |
|---|---|
|
Completing |
|
Drain |
|
Fail |
|
Invalid |
|
InvalidReg |
|
Maintenance |
|
NotResponding |
|
Undrain |
The .Message field on the SlurmNodeStateDrain condition carries the Slurm
drain reason string.
Conceptual states derived from condition combinations:
Busy — the node is running work:
Allocated,Mixed, orCompletingisTrue.Drain — the node has the drain flag set:
DrainisTrueANDUndrainis notTrue.Drained — drain is complete:
DrainAND NOTBusy.Draining — drain is in progress:
DrainANDBusy.
Drain and Cordon Design
The operator implements a unidirectional Kubernetes-to-Slurm cordon model. Cordoning a Kubernetes node or annotating a pod triggers a Slurm drain; uncordoning reverses it. The operator never initiates a cordon on its own — it only reacts to external signals.
The operator prefixes all drain reasons it sets with slurm-operator:. Drain
reasons without this prefix are treated as externally owned and are never
modified or cleared. This ensures that drains set by administrators or external
tools via scontrol are preserved across reconciliation cycles.
Scale-in Lifecycle
During scale-in, the controller selects pods for deletion using a multi-criteria sort order (see Influencing Scale-in Order) and then drains each selected pod’s Slurm node before deleting it. A pod is never deleted until its Slurm node is fully drained, ensuring running jobs are not interrupted.
For practical usage of annotations, labels, and integration patterns, see NodeSet Operations.