slurm-bridge

slurm-bridge

Run Slurm as a Kubernetes scheduler. A Slinky project.

Code Respository

The code repository can be accessed at https://github.com/SlinkyProject/slurm-bridge

Overview

Slurm and Kubernetes are workload managers originally designed for different kinds of workloads. In broad strokes: Kubernetes excels at scheduling workloads that typically run for an indefinite amount of time, with potentially vague resource requirements, on a single node, with loose policy, but can scale its resource pool infinitely to meet demand; Slurm excels at quickly scheduling workloads that run for a finite amount of time, with well defined resource requirements and topology, on multiple nodes, with strict policy, but its resource pool is known.

This project enables the best of both workload managers. It contains a Kubernetes scheduler to manage select workload from Kubernetes.

For additional architectural notes, see the architecture docs.

Supported Slurm Versions

Data Parser: v41

  • 24.05
  • 24.11

Overall Architecture

This is a basic architecture. A more in depth description can be found in the docs directory.

Image

Known Issues

  • CGroups is currently disabled, due to difficulties getting core information into the pods.
  • Updates may be slow, due to needing to wait for sequencing before the slurm-controller can be deployed.

License

Copyright (C) SchedMD LLC.

Licensed under the Apache License, Version 2.0 you may not use project except in compliance with the license.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.