Architecture

Overview

This document describes the high-level architecture of the Slinky slurm-operator.

Operator

The following diagram illustrates the operator, from a communication perspective.

Image

The slurm-operator follows the Kubernetes operator pattern.

Operators are software extensions to Kubernetes that make use of custom resources to manage applications and their components. Operators follow Kubernetes principles, notably the control loop.

The slurm-operator has one controller for each Custom Resource Definition (CRD) that it is responsible to manage. Each controller has a control loop where the state of the Custom Resource (CR) is reconciled.

Often, an operator is only concerned about data reported by the Kubernetes API. In our case, we are also concerned about data reported by the Slurm API, which influences how the slurm-operator reconciles certain CRs.

Slurm

The following diagram illustrates a containerized Slurm cluster, from a communication perspective.

Image

For additional information about Slurm, see the slurm docs.

Hybrid

The following hybrid diagram is an example. There are many different configurations for a hybrid setup. The core takeaways are: slurmd can be on bare-metal and still be joined to your containerized Slurm cluster; external services that your Slurm cluster needs or wants (e.g. AD/LDAP, NFS, MariaDB) do not have to live in Kubernetes to be functional with your Slurm cluster.

Image

Autoscale

Kubernetes supports resource autoscaling. In the context of Slurm, autoscaling Slurm compute nodes can be quite useful when your Kubernetes and Slurm clusters have workload fluctuations.

Image

See the autoscaling guide for additional information.

Directory Map

This project follows the conventions of:

api/

Contains Custom Kubernetes API definitions. These become Custom Resource Definitions (CRDs) and are installed into a Kubernetes cluster.

cmd/

Contains code to be compiled into binary commands.

config/

Contains yaml configuration files used for kustomize deployments.

docs/

Contains project documentation.

hack/

Contains files for development and Kubebuilder. This includes a kind.sh script that can be used to create a kind cluster with all pre-requisites for local testing.

helm/

Contains helm deployments, including the configuration files such as values.yaml.

Helm is the recommended method to install this project into your Kubernetes cluster.

internal/

Contains code that is used internally. This code is not externally importable.

internal/controller/

Contains the controllers.

Each controller is named after the Custom Resource Definition (CRD) it manages. Currently, this consists of the nodeset and the cluster CRDs.