Workloads

Table of Contents

Overview

In Slurm, all workloads are represented by jobs. In slurm-bridge, however, there are a number of forms that workloads can take. While workloads can still be submitted as a Slurm job, slurm-bridge also enables users to submit workloads through Kubernetes. Most workloads that can be submitted to slurm-bridge from within Kubernetes are represented by an existing Kubernetes batch workload primitive.

At this time, slurm-bridge has scheduling support for Jobs, JobSets, Pods, PodGroups, and LeaderWorkerSets. If your workload requires or benefits from co-scheduled pod launch (e.g. MPI, multi-node), consider representing your workload as a PodGroup or LeaderWorkerSets.

Using the slurm-bridge Scheduler

slurm-bridge uses an admission controller to control which resources are scheduled using the slurm-bridge-scheduler. The slurm-bridge-scheduler is designed as a non-primary scheduler and is not intended to replace the default kube-scheduler. The slurm-bridge admission controller only schedules pods that request slurm-bridge as their scheduler or are in a configured namespace. By default, the slurm-bridge admission controller is configured to automatically use slurm-bridge as the scheduler for all pods in the configured namespaces.

Alternatively, a pod can specify Pod.Spec.schedulerName=slurm-bridge-scheduler from any namespace to indicate that it should be scheduler using the slurm-bridge-scheduler.

Please review slurm-bridge admission controller to learn more.

Annotations

Users can better inform or influence slurm-bridge how to represent their Kubernetes workload within Slurm by adding annotations on the parent Object.

Example “pause” bare pod to illustrate annotations:

apiVersion: v1
kind: Pod
metadata:
  name: pause
  # `slurm-bridge` annotations on parent object
  annotations:
    slinky.slurm.net/timelimit: "5"
    slinky.slurm.net/account: foo
spec:
  schedulerName: slurm-bridge-scheduler
  containers:
    - name: pause
      image: registry.k8s.io/pause:3.6
      resources:
        limits:
          cpu: "1"
          memory: 100Mi

Example “pause” deployment to illustrate annotations:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: pause
  # `slurm-bridge` annotations on parent object
  annotations:
    slinky.slurm.net/timelimit: "5"
    slinky.slurm.net/account: foo
spec:
  replicas: 2
  selector:
    matchLabels:
      app: pause
  template:
    metadata:
      labels:
        app: pause
    spec:
      schedulerName: slurm-bridge-scheduler
      containers:
        - name: pause
          image: registry.k8s.io/pause:3.6
          resources:
            limits:
              cpu: "1"
              memory: 100Mi

JobSets

This section assumes JobSets is installed.

JobSet pods are scheduled on a per-pod basis. The JobSet controller is responsible for managing the JobSet status and other Pod interactions once marked as completed.

PodGroups

This section assumes PodGroups CRD and the out-of-tree kube-scheduler controller for CoScheduling is installed.

helm install --repo https://scheduler-plugins.sigs.k8s.io scheduler-plugins scheduler-plugins \
  --namespace scheduler-plugins --create-namespace \
  --set 'plugins.enabled={CoScheduling}' --set 'scheduler.replicaCount=0'

Pods contained within a PodGroup will be co-scheduled and launched together. The PodGroup controller is responsible for managing the PodGroup status and other Pod interactions once marked as completed.

LeaderWorkerSet

This section assumes LeaderWorkerSet is installed.

LeaderWorkerSet groups will be co-scheduled so pods of each group will be guaranteed to launch together.

Note

Topology-aware placement is not supported yet, so some features of LeaderWorkerSet may not behave as expected.