System Requirements Guide

Table of Contents

Overview

This guide provides guidance on recommended hardware to run the Slurm Operator and Slurm clusters on Kubernetes.

Kubernetes

Hardware

Generally, your Kubernetes cluster should consist of more than one node where you have at least one control-plane.

Note

It is impossible for us to provide a minimum system requirement for your workloads.

Storage Class

It is recommended to have at least one storage class and a default storage class. Slurm and other services installed on your Kubernetes cluster may use Persistent Volume Claim (PVC) and Persistent Volume (PV).

Operator

The operator components consist of:

  • slurm-operator

  • slurm-operator-webhook

Operating System & Architecture

Slinky container images are built on a distroless image.

The following machine architectures are supported:

  • amd64 (x86_64)

  • arm64 (aarch64)

Inspect the OCI artifacts for specific details.

Hardware

The operator benefits from more cores and memory due to handling requests over the network and responding. The amount of cores and memory depends on how how many worker threads were configured and how busy the operator is.

Note

It is impossible for us to provide a minimum system requirement for your workloads. While the operator can run with 1 core and 1GB of memory, production usage may find these resources insufficient.

Slurm

Slurm components consist of:

  • slurmctld

  • slurmd

  • slurmdbd

  • slurmrestd

  • sackd

For more information, see the Slurm docs.

Operating System & Architecture

Slurm has broad support for Linux distributions and limited support for FreeBSD and NetBSD.

Slurm has been thoroughly tested on most popular Linux distributions using arm64 (aarch64), ppc64, and x86_64 architectures. Some features are limited to recent releases and newer Linux kernel versions.

See the Slurm doc for details.

The Slurm container images built for Slinky only cover a subset of Slurm’s operating system and architecture support.

The following machine architectures are supported:

  • amd64 (x86_64)

  • arm64 (aarch64)

Inspect the OCI artifacts for specific details.

Hardware

All Slurm daemons benefit from more cores and memory due to handling requests over the network and responding. The amount of cores and memory depends on how busy your cluster is. Due to internal data locks, there is a balance to core counts and single core performance. Some daemons are more sensitive than others. Some Slurm daemons benefit from fast storage in select areas of the filesystem. All daemons prefer to not have noisy neighbors, so to speak – other processes on the machine cause contention for cores and memory.

Below are notes of interest:

  • slurmctld

    • Scheduling benefits greatly from single core performance

    • StateSaveLocation benefits greatly from fast storage

  • slurmd

    • SlurmdSpoolDir benefits greatly from fast storage

    • Depending on the users’ jobs, hardware considerations should be made

  • slurmdbd

    • Benefits from being co-located on the Database machine, communicating over a socket instead of a network.

  • slurmrestd

    • Benefits from being co-located on the slurmctld and/or slurmdbd machine, communicating over a socket instead of a network.

  • sackd

    • Treat like munged for the purposes of system requirements

  • Database

    • Benefits from fast storage

See the field notes, slide 17, for notes on system requirements.

Note

It is impossible for us to provide a minimum system requirement for your workloads. While Slurm daemons can run with 1 core and 1GB of memory, production usage may find these resources insufficient.