NodeSet Controller
Table of Contents
Overview
The nodeset controller is responsible for managing and reconciling the NodeSet CRD, which represents a set of homogeneous Slurm Nodes.
Design
This controller is responsible for managing and reconciling the NodeSet CRD. In addition to the regular responsibility of managing resources in Kubernetes via the Kubernetes API, this controller should take into consideration the state of Slurm to make certain reconciliation decisions.
Sequence Diagram
sequenceDiagram
autonumber
actor User as User
participant KAPI as Kubernetes API
participant NS as NodeSet Controller
box Operator Internals
participant SCM as Slurm Client Map
participant SEC as Slurm Event Channel
end %% Operator Internals
participant SC as Slurm Client
participant SAPI as Slurm REST API
loop Watch Slurm Nodes
SC->>+SAPI: Get Slurm Nodes
SAPI-->>-SC: Return Slurm Nodes
SC->>SEC: Add Event for Cache Delta
end %% loop Watch Slurm Nodes
note over KAPI: Handle CR Update
SEC-->>NS: Watch Event Channel
User->>KAPI: Update NodeSet CR
KAPI-->>NS: Watch NodeSet CRD
opt Scale-out Replicas
NS->>KAPI: Create Pods
end %% Scale-out Replicas
opt Scale-in Replicas
SCM-->>NS: Lookup Slurm Client
NS->>+SC: Drain Slurm Node
SC->>+SAPI: Drain Slurm Node
SAPI-->>-SC: Return Drain Slurm Node Status
SC-->>-NS: Drain Slurm Node
alt Slurm Node is Drained
NS->>KAPI: Delete Pod
else
NS->>NS: Check Again Later
end %% alt Slurm Node is Drained
end %% opt Scale-in Replicas