This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Concepts

1: Architecture
2: Cluster Controller
3: NodeSet Controller
4: Cluster CRD
5: NodeSet CRD
6: Slurm

Concepts related to slurm-operator internals and design.

1 - Architecture

Overview

This document describes the high-level architecture of the Slinky slurm-operator.

Operator

The following diagram illustrates the operator, from a communication perspective.

The slurm-operator follows the Kubernetes operator pattern.

Operators are software extensions to Kubernetes that make use of custom resources to manage applications and their components. Operators follow Kubernetes principles, notably the control loop.

The slurm-operator has one controller for each Custom Resource Definition (CRD) that it is responsible to manage. Each controller has a control loop where the state of the Custom Resource (CR) is reconciled.

Often, an operator is only concerned about data reported by the Kubernetes API. In our case, we are also concerned about data reported by the Slurm API, which influences how the slurm-operator reconciles certain CRs.

Slurm

The following diagram illustrates a containerized Slurm cluster, from a communication perspective.

For additional information about Slurm, see the slurm docs.

Hybrid

The following hybrid diagram is an example. There are many different configurations for a hybrid setup. The core takeaways are: slurmd can be on bare-metal and still be joined to your containerized Slurm cluster; external services that your Slurm cluster needs or wants (e.g. AD/LDAP, NFS, MariaDB) do not have to live in Kubernetes to be functional with your Slurm cluster.

Autoscale

Kubernetes supports resource autoscaling. In the context of Slurm, autoscaling Slurm compute nodes can be quite useful when your Kubernetes and Slurm clusters have workload fluctuations.

See the autoscaling guide for additional information.

Directory Map

This project follows the conventions of:

`api/`

Contains Custom Kubernetes API definitions. These become Custom Resource Definitions (CRDs) and are installed into a Kubernetes cluster.

`cmd/`

Contains code to be compiled into binary commands.

`config/`

Contains yaml configuration files used for kustomize deployments.

`docs/`

Contains project documentation.

`hack/`

Contains files for development and Kubebuilder. This includes a kind.sh script that can be used to create a kind cluster with all pre-requisites for local testing.

`helm/`

Contains helm deployments, including the configuration files such as values.yaml.

Helm is the recommended method to install this project into your Kubernetes cluster.

`internal/`

Contains code that is used internally. This code is not externally importable.

`internal/controller/`

Contains the controllers.

Each controller is named after the Custom Resource Definition (CRD) it manages. Currently, this consists of the nodeset and the cluster CRDs.

2 - Cluster Controller

Overview

This controller is responsible for managing and reconciling the Cluster CRD. A CRD represents communication to a Slurm cluster via slurmrestd and auth/jwt.

This controller uses the Slurm client library.

Sequence Diagram

sequenceDiagram
    autonumber

    actor User as User
    participant KAPI as Kubernetes API
    participant CC as Cluster Controller
    box Operator Internals
        participant SCM as Slurm Client Map
        participant SEC as Slurm Event Channel
    end %% Operator Internals

    note over KAPI: Handle CR Creation
    User->>KAPI: Create Cluster CR
    KAPI-->>CC: Watch Cluster CRD
    CC->>+KAPI: Get referenced secret
    KAPI-->>-CC: Return secret
    create participant SC as Slurm Client
    CC->>+SC: Create Slurm Client for Cluster
    SC-->>-CC: Return Slurm Client Status
    loop Watch Slurm Nodes
        SC->>+SAPI: Get Slurm Nodes
        SAPI-->>-SC: Return Slurm Nodes
        SC->>SEC: Add Event for Cache Delta
    end %% loop Watch Slurm Nodes
    CC->>SCM: Add Slurm Client to Map
    CC->>+SC: Ping Slurm Control Plane
    SC->>+SAPI: Ping Slurm Control Plane
    SAPI-->>-SC: Return Ping
    SC-->>-CC: Return Ping
    CC->>KAPI: Update Cluster CR Status

    note over KAPI: Handle CR Deletion
    User->>KAPI: Delete Cluster CR
    KAPI-->>CC: Watch Cluster CRD
    SCM-->>CC: Lookup Slurm Client
    destroy SC
    CC-)SC: Shutdown Slurm Client
    CC->>SCM: Remove Slurm Client from Map

    participant SAPI as Slurm REST API

3 - NodeSet Controller

Overview

The nodeset controller is responsible for managing and reconciling the NodeSet CRD, which represents a set of homogeneous Slurm Nodes.

Design

This controller is responsible for managing and reconciling the NodeSet CRD. In addition to the regular responsibility of managing resources in Kubernetes via the Kubernetes API, this controller should take into consideration the state of Slurm to make certain reconciliation decisions.

Sequence Diagram

sequenceDiagram
    autonumber

    actor User as User
    participant KAPI as Kubernetes API
    participant NS as NodeSet Controller
    box Operator Internals
        participant SCM as Slurm Client Map
        participant SEC as Slurm Event Channel
    end %% Operator Internals
    participant SC as Slurm Client
    participant SAPI as Slurm REST API

    loop Watch Slurm Nodes
        SC->>+SAPI: Get Slurm Nodes
        SAPI-->>-SC: Return Slurm Nodes
        SC->>SEC: Add Event for Cache Delta
    end %% loop Watch Slurm Nodes

    note over KAPI: Handle CR Update
    SEC-->>NS: Watch Event Channel
    User->>KAPI: Update NodeSet CR
    KAPI-->>NS: Watch NodeSet CRD
    opt Scale-out Replicas
        NS->>KAPI: Create Pods
    end %% Scale-out Replicas
    opt Scale-in Replicas
        SCM-->>NS: Lookup Slurm Client
        NS->>+SC: Drain Slurm Node
        SC->>+SAPI: Drain Slurm Node
        SAPI-->>-SC: Return Drain Slurm Node Status
        SC-->>-NS: Drain Slurm Node
        alt Slurm Node is Drained
            NS->>KAPI: Delete Pod
        else
            NS->>NS: Check Again Later
        end %% alt Slurm Node is Drained
    end %% opt Scale-in Replicas

4 - Cluster CRD

5 - NodeSet CRD

6 - Slurm

Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. Slurm requires no kernel modifications for its operation and is relatively self-contained. As a cluster workload manager, Slurm has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work. Optional plugins can be used for accounting, advanced reservation, gang scheduling (time sharing for parallel jobs), backfill scheduling, topology optimized resource selection, resource limits by user or bank account, and sophisticated multifactor job prioritization algorithms.

Architecture

Slurm Architecture

See the Slurm architecture docs for more information.