This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
slurm-bridge
Download the slurm-bridge repository
here, start using bridge with
the quickstart guide, or read on to learn more.
Slurm and Kubernetes are workload managers originally designed for different
kinds of workloads. Kubernetes excels at scheduling workloads that run for an
indefinite amount of time, with potentially vague resource requirements, on a
single node, with loose policy, but can scale its resource pool infinitely to
meet demand; Slurm excels at quickly scheduling workloads that run for a finite
amount of time, with well defined resource requirements and topology, on
multiple nodes, with strict policy, and a known resource pool.
Why you need slurm-bridge
and what it can do
This project enables users to take advantage of the best features of both
workload managers. It contains a Kubernetes scheduler to manage select
workloads from Kubernetes, which allows for co-location of Kubernetes and Slurm
workloads within the same cluster. This means the same hardware can be used to
run both traditional HPC and cloud-like workloads, reducing operating costs.
Using slurm-bridge
, workloads can be submitted from within a Kubernetes
context as a Pod
, PodGroup
, Job
, or JobSet
, or from a Slurm context
using salloc
or sbatch
. Workloads submitted via Slurm will execute as they
would in a Slurm-only environment, using slurmd
. Workloads submitted from
Kubernetes will have their resource requirements translated into a
representative Slurm job by slurm-bridge
. That job will serve as a placeholder
and will be scheduled by the Slurm controller. Upon resource allocation to a K8s
workload by the Slurm controller, slurm-bridge
will bind the workload’s pod(s)
to the allocated node(s). At that point, the kubelet will launch and run the pod
the same as it would within a standard Kubernetes instance.
For additional architectural notes, see the
architecture docs.
Features
slurm-bridge
enables scheduling of Kubernetes workloads using the Slurm
scheduler, and can take advantage of most of the scheduling features of Slurm
itself. These include:
- Priority: assigns priorities to jobs upon submission and
on an ongoing basis (e.g. as they age).
- Preemption: stop one or more low-priority jobs to let a
high-priority job run.
- QoS: sets of policies affecting scheduling priority,
preemption, and resource limits.
- Fairshare: distribute resources equitably among users
and accounts based on historical usage.
- Reservations: reserve resources for select users or
groups
Supported Versions
- Kubernetes Version: >= v1.29
- Slurm Version: >= 25.05
Current Limitations
- Exclusive, whole node allocations are made for each pod.
Get started using slurm-bridge
with the quickstart guide!
Versions:
1 - 0.3.x
1.1 - Quickstart
This quickstart guide will help you get slurm-bridge
running and configured
with your existing cluster.
If you’d like to try out slurm-bridge
locally before deploying it on a
cluster, consider following our guide for configuring a local test environment
instead.
This document assumes a basic understanding of
Kubernetes architecture. It
is highly recommended that those who are unfamiliar with the core concepts of
Kubernetes review the documentation on
Kubernetes,
pods, and
nodes before getting
started.
Pre-requisites
- A functional Slurm cluster with:
- A functional Kubernetes cluster that includes the hosts running colocated
kubelet and slurmd and:
- Matching NodeNames in Slurm and Kubernetes for all overlapping nodes
- In the event that the colocated node’s Slurm NodeName does not match the
Kubernetes Node name, you should patch the Kubernetes node with a label to
allow
slurm-bridge
to map the colocated Kubernetes and Slurm node.
kubectl patch node $KUBERNETES_NODENAME -p "{\"metadata\":{\"labels\":{\"slinky.slurm.net/slurm-nodename\":\"$SLURM_NODENAME\"}}}"
cgroups/v2
configured on all hosts with a colocated kubelet and slurmd
Installation
1. Install the required helm charts:
helm repo update
helm install cert-manager jetstack/cert-manager \
--namespace cert-manager --create-namespace --set crds.enabled=true
The helm chart used by slurm-bridge
has a number of parameters in
values.yaml
that can be modified to tweak various parameters of slurm-bridge. Most of these
values should work without modification.
Downloading values.yaml
:
curl -L https://raw.githubusercontent.com/SlinkyProject/slurm-bridge/refs/tags/v0.2.0/helm/slurm-bridge/values.yaml \
-o values-bridge.yaml
Depending on your Slurm configuration, you may need to
configure the following variables:
schedulerConfig.partition
- this is the default partition with which
slurm-bridge
will associate jobs. This partition should only include nodes
that have both [slurmd] and the [kubelet] running. The default value of this
variable is slurm-bridge
.
sharedConfig.slurmRestApi
- the URL used by slurm-bridge
to interact with
the Slurm REST API. Changing this value may be necessary if you run the REST
API on a different URL or port. The default value of this variable is
http://slurm-restapi.slurm:6820
3. Download and install the slurm-bridge
package from OCI:
helm install slurm-bridge oci://ghcr.io/slinkyproject/charts/slurm-bridge \
--values=values-bridge.yaml --version=0.3.0 --namespace=slinky --create-namespace
You can check if your cluster deployed successfully with:
kubectl --namespace=slinky get pods
Your output should be similar to:
NAME READY STATUS RESTARTS AGE
slurm-bridge-admission-85f89cf884-8c9jt 1/1 Running 0 1m0s
slurm-bridge-controllers-757f64b875-bsfnf 1/1 Running 0 1m0s
slurm-bridge-scheduler-5484467f55-wtspk 1/1 Running 0 1m0s
Running Your First Job
Now that slurm-bridge
is configured, we can write a workload. slurm-bridge
schedules Kubernetes workloads using the Slurm scheduler by translating a
Kubernetes workload in the form of a Jobs, JobSets, Pods, and PodGroups
into a representative Slurm job, which is used for scheduling purposes. Once a
workload is allocated resources, the Kubelet binds the Kubernetes workload to
the allocated resources and executes it. There are sample workload definitions
in the slurm-bridge
repo
here.
Here’s an example of a simple job, found in hack/examples/single.yaml
:
---
apiVersion: batch/v1
kind: Job
metadata:
name: job-sleep-single
namespace: slurm-bridge
annotations:
slinky.slurm.net/job-name: job-sleep-single
spec:
completions: 1
parallelism: 1
template:
spec:
containers:
- name: sleep
image: busybox:stable
command: [sh, -c, sleep 3]
resources:
requests:
cpu: "1"
memory: 100Mi
limits:
cpu: "1"
memory: 100Mi
restartPolicy: Never
Let’s run this job:
❯ kubectl apply -f hack/examples/job/single.yaml
job.batch/job-sleep-single created
At this point, Kubernetes has dispatched our job, it was scheduled by Slurm, and
executed to completion. Let’s take a look at each place that our job shows up.
On the Slurm side, we can observe the placeholder job that was used to schedule
our workload:
slurm@slurm-controller-0:/tmp$ scontrol show jobs
JobId=1 JobName=job-sleep-single
UserId=slurm(401) GroupId=slurm(401) MCS_label=kubernetes
Priority=1 Nice=0 Account=(null) QOS=normal
JobState=CANCELLED Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
RunTime=00:00:08 TimeLimit=UNLIMITED TimeMin=N/A
SubmitTime=2025-07-10T15:52:53 EligibleTime=2025-07-10T15:52:53
AccrueTime=2025-07-10T15:52:53
StartTime=2025-07-10T15:52:53 EndTime=2025-07-10T15:53:01 Deadline=N/A
SuspendTime=None SecsPreSuspend=0 LastSchedEval=2025-07-10T15:52:53 Scheduler=Main
Partition=slurm-bridge AllocNode:Sid=10.244.5.5:1
ReqNodeList=(null) ExcNodeList=(null)
NodeList=slurm-bridge-1
BatchHost=slurm-bridge-1
StepMgrEnabled=Yes
NumNodes=1 NumCPUs=4 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
ReqTRES=cpu=1,mem=96046M,node=1,billing=1
AllocTRES=cpu=4,mem=96046M,node=1,billing=4
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
OverSubscribe=NO Contiguous=0 Licenses=(null) LicensesAlloc=(null) Network=(null)
Command=(null)
WorkDir=/tmp
AdminComment={"pods":["slurm-bridge/job-sleep-single-8wtc2"]}
OOMKillStep=0
Note that the Command
field is equal to
(null)
, and that the JobState
field is equal to CANCELLED
. These are so
(null)
, and that the JobState
field is equal to CANCELLED
. This is because
this Slurm job is only a placeholder - no work is actually done by the
placeholder. Instead, the job is cancelled upon allocation so that the Kubelet
can bind the workload to the selected node(s) for the duration of the job.
We can also look at this job using kubectl
:
❯ kubectl describe job --namespace=slurm-bridge job-sleep-single
Name: job-sleep-single
Namespace: slurm-bridge
Selector: batch.kubernetes.io/controller-uid=8a03f5f6-f0c0-4216-ac0b-8c9b70c92eec
Labels: batch.kubernetes.io/controller-uid=8a03f5f6-f0c0-4216-ac0b-8c9b70c92eec
batch.kubernetes.io/job-name=job-sleep-single
controller-uid=8a03f5f6-f0c0-4216-ac0b-8c9b70c92eec
job-name=job-sleep-single
Annotations: slinky.slurm.net/job-name: job-sleep-single
Parallelism: 1
Completions: 1
Completion Mode: NonIndexed
Start Time: Thu, 10 Jul 2025 09:52:53 -0600
Completed At: Thu, 10 Jul 2025 09:53:02 -0600
Duration: 9s
Pods Statuses: 0 Active (0 Ready) / 1 Succeeded / 0 Failed
Pod Template:
Labels: batch.kubernetes.io/controller-uid=8a03f5f6-f0c0-4216-ac0b-8c9b70c92eec
batch.kubernetes.io/job-name=job-sleep-single
controller-uid=8a03f5f6-f0c0-4216-ac0b-8c9b70c92eec
job-name=job-sleep-single
Containers:
sleep:
Image: busybox:stable
Port: <none>
Host Port: <none>
Command:
sh
-c
sleep 3
Limits:
cpu: 1
memory: 100Mi
Requests:
cpu: 1
memory: 100Mi
Environment: <none>
Mounts: <none>
Volumes: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 14m job-controller Created pod: job-sleep-single-8wtc2
Normal Completed 14m job-controller Job completed
As Kubernetes is the context in which this job actually executed, this is
generally the more useful of the two outputs.
Celebrate!
At this point, you should have a cluster running slurm-bridge
.
Recommended next steps involve reading through
creating a workload, learning more about the
architecture of slurm-bridge
, or browsing our
how-to-guides on administrative tasks.
1.2 - Concepts
Concepts related to slurm-bridge
internals and design.
1.2.1 - Admission
Overview
The Kubernetes documentation
defines admission controllers as:
a piece of code that intercepts requests to the Kubernetes API server prior to
persistence of the resource, but after the request is authenticated and
authorized.
It also states that:
Admission control mechanisms may be validating, mutating, or both. Mutating
controllers may modify the data for the resource being modified; validating
controllers may not.
The slurm-bridge
admission controller is a mutating controller. It modifies
any pods within certain namespaces (slurm-bridge
, by default) to use the
slurm-bridge
scheduler instead of the default Kube scheduler.
Design
Any pods created in certain namespaces will have their .spec.schedulerName
changed to our scheduler.
Managed namespaces are defined as a list of namespace as configured in the
admission controller’s values.yaml
for managedNamespaces[]
.
Sequence Diagram
sequenceDiagram
autonumber
participant KAPI as Kubernetes API
participant SBA as Slurm-Bridge Admission
KAPI-->>SBA: Watch Pod Create/Update
opt Pod in managed Namespaces
SBA->>KAPI: Update `.spec.schedulerName` and Tolerations
KAPI-->>SBA: Update Response
end %% opt Pod in managed Namespaces
1.2.2 - Architecture
Overview
This document describes the high-level architecture of the Slinky
slurm-bridge
.
Big Picture

Directory Map
This project follows the conventions of:
api/
Contains Custom Kubernetes API definitions. These become Custom Resource
Definitions (CRDs) and are installed into a Kubernetes cluster.
cmd/
Contains code to be compiled into binary commands.
config/
Contains yaml configuration files used for kustomize
deployments
docs/
Contains project documentation.
hack/
Contains files for development and Kubebuilder. This includes a kind.sh script
that can be used to create a kind cluster with all pre-requisites for local
testing.
helm/
Contains helm deployments, including the configuration files such as
values.yaml.
Helm is the recommended method to install this project into your Kubernetes
cluster.
internal/
Contains code that is used internally. This code is not externally importable.
internal/controller/
Contains the controllers.
Each controller is named after the Custom Resource Definition (CRD) it manages.
Currently, this consists of the nodeset and the cluster CRDs.
internal/scheduler/
Contains scheduling framework plugins. Currently, this consists of
slurm-bridge
.
1.2.3 - Controllers
Overview
The Kubernetes documentation
defines controllers as:
control loops that watch the state of your cluster, then make or request
changes where needed. Each controller tries to move the current cluster state
closer to the desired state.
Within slurm-bridge
, there are multiple controllers that manage the state of
different bridge components:
- Node Controller - Responsible for the state of nodes in the bridge cluster
- Workload Controller - Responsible for the state of pods and other
workloads running on
slurm-bridge
Node Controller
The node controller is responsible for tainting the managed nodes so the
scheduler component is fully in control of all workload that is bound to those
nodes.
Additionally, this controller will reconcile certain node states for scheduling
purposes. Slurm becomes the source of truth for scheduling among managed nodes.
A managed node is defined as a node that has a colocated kubelet
and slurmd
on the same physical host, and the slurm-bridge can schedule on.
sequenceDiagram
autonumber
participant KAPI as Kubernetes API
participant SWC as Slurm Workload Controller
participant SAPI as Slurm REST API
loop Reconcile Loop
KAPI-->>SWC: Watch Kubernetes Nodes
alt Node is managed
SWC->>KAPI: Taint Node
KAPI-->>SWC: Taint Node
else
SWC->>KAPI: Untaint Node
KAPI-->>SWC: Untaint Node
end %% alt Node is managed
alt Node is schedulable
SWC->>SAPI: Drain Node
SAPI-->>SWC: Taint Node
else
SWC->>SAPI: Undrain Node
SAPI-->>SWC: Undrain Node
end %% alt Node is schedulable
end %% loop Reconcile Loop
Workload Controller
The workload controller reconciles Kubernetes Pods and Slurm Jobs. Slurm is the
source of truth for what workload is allowed to run on which managed nodes.
sequenceDiagram
autonumber
participant KAPI as Kubernetes API
participant SWC as Slurm Workload Controller
participant SAPI as Slurm REST API
loop Reconcile Loop
critical Map Slurm Job to Pod
KAPI-->>SWC: Watch Kubernetes Pods
SAPI-->>SWC: Watch Slurm Jobs
option Pod is Terminated
SWC->>SAPI: Terminate Slurm Job
SAPI-->>SWC: Return Status
option Job is Terminated
SWC->>KAPI: Evict Pod
KAPI-->>SWC: Return Status
end %% critical Map Slurm Job to Pod
end %% loop Reconcile Loop
1.2.4 - Scheduler
Overview
In Kubernetes, scheduling refers to making sure that pods are matched to nodes
so that the kubelet can run them.
The scheduler controller in slurm-bridge
is responsible for scheduling
eligible pods onto nodes that are managed by slurm-bridge
. In doing so, the
slurm-bridge
scheduler interacts with the Slurm REST API in order to acquire
allocations for its’ workloads. In slurm-bridge
, slurmctld
serves as the
source of truth for scheduling decisions.
Design
This scheduler is designed to be a non-primary scheduler (e.g. should not
replace the default kube-scheduler). This means that only certain pods should
be scheduled via this scheduler (e.g. non-critical pods).
This scheduler represents Kubernetes Pods as a Slurm Job, waits for Slurm to
schedule the Job, then informs Kubernetes on which nodes to allocate the
represented Pods. This scheduler defers scheduling decisions to Slurm, hence
certain assumptions about the environment must be met for this to function
correctly.
Sequence Diagram
sequenceDiagram
autonumber
actor user as User
participant KAPI as Kubernetes API
participant SBS as Slurm-Bridge Scheduler
participant SAPI as Slurm REST API
loop Workload Submission
user->>KAPI: Submit Pod
KAPI-->>user: Return Request Status
end %% loop Workload Submission
loop Scheduling Loop
SBS->>KAPI: Get Next Pod in Workload Queue
KAPI-->>SBS: Return Next Pod in Workload Queue
note over SBS: Honor Slurm scheduling decision
critical Lookup Slurm Placeholder Job
SBS->>SAPI: Get Placeholder Job
SAPI-->>SBS: Return Placeholder Job
option Job is NotFound
note over SBS: Translate Pod(s) into Slurm Job
SBS->>SAPI: Submit Placeholder Job
SAPI-->>SBS: Return Submit Status
option Job is Pending
note over SBS: Check again later...
SBS->>SBS: Requeue
option Job is Allocated
note over SBS: Bind Pod(s) to Node(s) from the Slurm Job
SBS->>KAPI: Bind Pod(s) to Node(s)
KAPI-->>SBS: Return Bind Request Status
end %% Lookup Slurm Placeholder Job
end %% loop Scheduling Loop
1.3 - Tasks
Guides to tasks related to the administration of a cluster running
slurm-bridge
.
1.3.1 - Running slurm-bridge locally
You may want to run slurm-bridge
on a single machine in order to test the
software or familiarize yourself with it prior to installing it on your cluster.
This should only be done for testing and evaluation purposes and should not be
used for production environments.
We have provided a script to do this using Kind and
the
hack/kind.sh
script.
This document assumes a basic understanding of
Kubernetes architecture. It
is highly recommended that those who are unfamiliar with the core concepts of
Kubernetes review the documentation on
Kubernetes,
pods, and
nodes before getting
started.
Pre-requisites
- go 1.17+ must be installed on your system
Setting up your environment
- Install Kind using
go install
:
go install sigs.k8s.io/kind@v0.29.0
If you get kind: command not found
when running the next step, you may need to
add GOPATH to your PATH:
export GOPATH=$HOME/go
export PATH=$PATH:$GOROOT/bin:$GOPATH/bin
- Confirm that kind is working properly by running the following commands:
kind create cluster
kubectl get nodes --all-namespaces
kind delete cluster
- Clone the
slurm-bridge
repo and enter it:
git clone git@github.com:SlinkyProject/slurm-bridge.git
cd slurm-bridge
Installing slurm-bridge
within your environment
Provided with slurm-bridge
is the script hack/kind.sh
that interfaces with
kind to deploy the slurm-bridge
helm chart within your local environment.
- Create your cluster using
hack/kind.sh
:
- Familiarize yourself with and use your test environment:
kubectl get pods --namespace=slurm-bridge
kubectl get pods --namespace=slurm
kubectl get pods --namespace=slinky
Celebrate!
At this point, you should have a kind cluster running slurm-bridge
.
Cleaning up
hack/kind.sh
provides a mechanism by which to destroy your test environment.
Run:
hack/kind.sh --delete
To destroy your kind cluster.
1.3.2 - Creating a Workload
In Slurm, all workloads are represented by jobs. In slurm-bridge
, however,
there are a number of forms that workloads can take. While workloads can still
be submitted as a Slurm job, slurm-bridge
also enables users to submit
workloads through Kubernetes. Most workloads that can be submitted to
slurm-bridge
from within Kubernetes are represented by an existing Kubernetes
batch workload primitive.
At this time, slurm-bridge
has scheduling support for Jobs,
JobSets, Pods, and PodGroups. If your workload
requires or benefits from co-scheduled pod launch (e.g. MPI, multi-node),
consider representing your workload as a JobSet or
PodGroup.
Using the slurm-bridge
Scheduler
slurm-bridge
uses an
admission controller
to control which resources are scheduled using the slurm-bridge-scheduler
. The
slurm-bridge-scheduler
is designed as a non-primary scheduler and is not
intended to replace the default
kube-scheduler.
The slurm-bridge
admission controller only schedules pods that request
slurm-bridge
as their scheduler or are in a configured namespace. By default,
the slurm-bridge
admission controller is configured to automatically use
slurm-bridge
as the scheduler for all pods in the configured namespaces.
Alternatively, a pod can specify Pod.Spec.schedulerName=slurm-bridge-scheduler
from any namespace to indicate that it should be scheduler using the
slurm-bridge-scheduler
.
You can learn more about the slurm-bridge
admission controller
here.
Annotations
Users can better inform or influence slurm-bridge
how to represent their
Kubernetes workload within Slurm by adding
annotations on the parent Object.
Example “pause” bare pod to illustrate annotations:
apiVersion: v1
kind: Pod
metadata:
name: pause
# `slurm-bridge` annotations on parent object
annotations:
slinky.slurm.net/timelimit: "5"
slinky.slurm.net/account: foo
spec:
schedulerName: slurm-bridge-scheduler
containers:
- name: pause
image: registry.k8s.io/pause:3.6
resources:
limits:
cpu: "1"
memory: 100Mi
Example “pause” deployment to illustrate annotations:
apiVersion: apps/v1
kind: Deployment
metadata:
name: pause
# `slurm-bridge` annotations on parent object
annotations:
slinky.slurm.net/timelimit: "5"
slinky.slurm.net/account: foo
spec:
replicas: 2
selector:
matchLabels:
app: pause
template:
metadata:
labels:
app: pause
spec:
schedulerName: slurm-bridge-scheduler
containers:
- name: pause
image: registry.k8s.io/pause:3.6
resources:
limits:
cpu: "1"
memory: 100Mi
JobSets
This section assumes JobSets is installed.
JobSet pods will be coscheduled and launched together. The JobSet controller is
responsible for managing the JobSet status and other Pod interactions once
marked as completed.
PodGroups
This section assumes PodGroups CRD is installed and the
out-of-tree kube-scheduler is installed and configured as a (non-primary)
scheduler.
Pods contained within a PodGroup will be co-scheduled and launched together. The
PodGroup controller is responsible for managing the PodGroup status and other
Pod interactions once marked as completed.