Scheduler
Table of Contents
Overview
In Kubernetes, scheduling refers to making sure that pods are matched to nodes so that the kubelet can run them.
The scheduler controller in slurm-bridge is responsible for scheduling
eligible pods onto nodes that are managed by slurm-bridge. In doing so, the
slurm-bridge scheduler interacts with the Slurm REST API in order to acquire
allocations for its’ workloads. In slurm-bridge, slurmctld serves as the
source of truth for scheduling decisions.
Design
This scheduler is designed to be a non-primary scheduler (e.g. should not replace the default kube-scheduler). This means that only certain pods should be scheduled via this scheduler (e.g. non-critical pods).
This scheduler represents Kubernetes Pods as a Slurm Job, waits for Slurm to schedule the Job, then informs Kubernetes on which nodes to allocate the represented Pods. This scheduler defers scheduling decisions to Slurm, hence certain assumptions about the environment must be met for this to function correctly.