Architecture
Table of Contents
Overview
This document describes the high-level architecture of the Slinky
slurm-bridge.
A pod scheduled by slurm-bridge will coordinate with Slurm to schedule a
placeholder job to represent the pod workload. The placeholder job uses the
external job capability in Slurm 25.05. An external Slurm job will be operate
like any other job in Slurm, with the exception that an external job will be
launched by something other than Slurm. In the case of slurm-bridge, the
placeholder job will determine where and when a pod run, but kubelet will
launch the pod instead of slurmd.
Pod Flowchart
The above diagram represents the process of scheduling a pod with Slurm through the following sequence:
A pod is applied to a configured
slurm-bridgenamespace.The pod is sent to the
slurm-bridgeadmission webhook.The
slurm-bridgescheduler begins placement of pod.slurm-bridgecoordinates with Slurm to create a “placeholder job”.The placeholder job is scheduled to
node1.slurm-bridgedetermines the placeholder job has started onnode1.The scheduler binds the pod to
node1.kubelet starts the pod on
node1.
During its lifecycle, the slurm-bridge controller will be reconciling events
from Kubernetes and Slurm.
Directory Map
This project follows the conventions of:
cmd/
Contains code to be compiled into binary commands.
config/
Contains yaml configuration files used for kustomize deployments.
docs/
Contains project documentation.
hack/
Contains files for development and Kubebuilder. This includes a kind.sh script that can be used to create a kind cluster with all pre-requisites for local testing.
helm/
Contains helm deployments, including the configuration files such as values.yaml.
Helm is the recommended method to install this project into your Kubernetes cluster.
internal/
Contains code that is used internally. This code is not externally importable.
internal/admission/
Contains the admission webhook.
The webhook sets the scheduler name for pods created in the configured namespace and enforces policy on labels and annotations used by the Slurm scheduler.
internal/controller/
Contains the node and pod controllers.
The pod controller syncs the state of pods running in Kubernetes with the associated placeholder job managed by Slurm, and vice versa. Similarly, the node controller syncs node states between Kubernetes and Slurm.
internal/scheduler/
Contains scheduling framework plugins. Currently, this consists of slurm-bridge.