This is the multi-page printable view of this section. Click here to print.
0.3.x
- 1: Overview
- 2: Architecture
- 3: Autoscaling
- 4: Cluster Controller
- 5: Development
- 6: NodeSet Controller
- 7: Pyxis
- 8: Slurm
- 9: Quickstart Guides
1 - Overview
Overview
Slurm and Kubernetes are workload managers originally designed for different kinds of workloads. In broad strokes: Kubernetes excels at scheduling workloads that typically run for an indefinite amount of time, with potentially vague resource requirements, on a single node, with loose policy, but can scale its resource pool infinitely to meet demand; Slurm excels at quickly scheduling workloads that run for a finite amount of time, with well defined resource requirements and topology, on multiple nodes, with strict policy, but its resource pool is known.
This project enables the best of both workload managers, unified on Kubernetes. It contains a Kubernetes operator to deploy and manage certain components of Slurm clusters. This repository implements custom-controllers and custom resource definitions (CRDs) designed for the lifecycle (creation, upgrade, graceful shutdown) of Slurm clusters.
For additional architectural notes, see the architecture docs.
Slurm Cluster
Slurm clusters are very flexible and can be configured in various ways. Our Slurm helm chart provides a reference implementation that is highly customizable and tries to expose everything Slurm has to offer.
For additional information about Slurm, see the slurm docs.
Features
NodeSets
A set of homogeneous Slurm nodes (compute nodes, workers), which are delegated to execute the Slurm workload.
The operator will take into consideration the running workload among Slurm nodes as it needs to scale-in, upgrade, or otherwise handle node failures. Slurm nodes will be marked as drain before their eventual termination pending scale-in or upgrade.
The operator supports NodeSet scale to zero, scaling the resource down to zero replicas. Hence, any Horizontal Pod Autoscaler (HPA) that also support scale to zero can be best paired with NodeSets.
Slurm
Slurm is a full featured HPC workload manager. To highlight a few features:
- Accounting: collect accounting information for every job and job step executed.
- Partitions: job queues with sets of resources and constraints (e.g. job size limit, job time limit, users permitted).
- Reservations: reserve resources for jobs being executed by select users and/or select accounts.
- Job Dependencies: defer the start of jobs until the specified dependencies have been satisfied.
- Job Containers: jobs which run an unprivileged OCI container bundle.
- MPI: launch parallel MPI jobs, supports various MPI implementations.
- Priority: assigns priorities to jobs upon submission and on an ongoing basis (e.g. as they age).
- Preemption: stop one or more low-priority jobs to let a high-priority job run.
- QoS: sets of policies affecting scheduling priority, preemption, and resource limits.
- Fairshare: distribute resources equitably among users and accounts based on historical usage.
- Node Health Check: periodically check node health via script.
Limitations
Installation
Install the slurm-operator:
helm install slurm-operator oci://ghcr.io/slinkyproject/charts/slurm-operator \
--namespace=slinky --create-namespace
Install a Slurm cluster:
helm install slurm oci://ghcr.io/slinkyproject/charts/slurm \
--namespace=slurm --create-namespace
For additional instructions, see the quickstart guide.
Upgrades
0.X Releases
Breaking changes may be introduced into newer CRDs. To upgrade between these versions, uninstall all Slinky charts and delete Slinky CRDs, then install the new release like normal.
helm --namespace=slurm uninstall slurm
helm --namespace=slinky uninstall slurm-operator
kubectl delete clusters.slinky.slurm.net
kubectl delete nodesets.slinky.slurm.net
Documentation
Project documentation is located in the docs directory of this repository.
Slinky documentation can be found here.
License
Copyright (C) SchedMD LLC.
Licensed under the Apache License, Version 2.0 you may not use project except in compliance with the license.
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
2 - Architecture
Overview
This document describes the high-level architecture of the Slinky
slurm-operator
.
Operator
The following diagram illustrates the operator, from a communication perspective.
The slurm-operator
follows the Kubernetes
operator pattern.
Operators are software extensions to Kubernetes that make use of custom resources to manage applications and their components. Operators follow Kubernetes principles, notably the control loop.
The slurm-operator
has one controller for each Custom Resource Definition
(CRD) that it is responsible to manage. Each controller has a control loop where
the state of the Custom Resource (CR) is reconciled.
Often, an operator is only concerned about data reported by the Kubernetes API.
In our case, we are also concerned about data reported by the Slurm API, which
influences how the slurm-operator
reconciles certain CRs.
Slurm
The following diagram illustrates a containerized Slurm cluster, from a communication perspective.
For additional information about Slurm, see the slurm docs.
Hybrid
The following hybrid diagram is an example. There are many different configurations for a hybrid setup. The core takeaways are: slurmd can be on bare-metal and still be joined to your containerized Slurm cluster; external services that your Slurm cluster needs or wants (e.g. AD/LDAP, NFS, MariaDB) do not have to live in Kubernetes to be functional with your Slurm cluster.
Autoscale
Kubernetes supports resource autoscaling. In the context of Slurm, autoscaling Slurm compute nodes can be quite useful when your Kubernetes and Slurm clusters have workload fluctuations.
See the autoscaling guide for additional information.
Directory Map
This project follows the conventions of:
api/
Contains Custom Kubernetes API definitions. These become Custom Resource Definitions (CRDs) and are installed into a Kubernetes cluster.
cmd/
Contains code to be compiled into binary commands.
config/
Contains yaml configuration files used for kustomize deployments.
docs/
Contains project documentation.
hack/
Contains files for development and Kubebuilder. This includes a kind.sh script that can be used to create a kind cluster with all pre-requisites for local testing.
helm/
Contains helm deployments, including the configuration files such as values.yaml.
Helm is the recommended method to install this project into your Kubernetes cluster.
internal/
Contains code that is used internally. This code is not externally importable.
internal/controller/
Contains the controllers.
Each controller is named after the Custom Resource Definition (CRD) it manages. Currently, this consists of the nodeset and the cluster CRDs.
3 - Autoscaling
Getting Started
Before attempting to autoscale NodeSets, Slinky should be fully deployed to a Kubernetes cluster and Slurm jobs should be able to run.
Dependencies
Autoscaling requires additional services that are not included in Slinky. Follow documentation to install Prometheus, Metrics Server, and KEDA.
Prometheus will install tools to report metrics and view them with Grafana. The
Metrics Server is needed to report CPU and memory usage for tools like
kubectl top
. KEDA is recommended for autoscaling as it provides usability
improvements over standard the Horizontal Pod Autoscaler (HPA).
To add KEDA in the helm install, run
helm repo add kedacore https://kedacore.github.io/charts
Install the slurm-exporter. This chart is installed as a dependency of the slurm helm chart by default. Configure using helm/slurm/values.yaml.
Verify KEDA Metrics API Server is running
$ kubectl get apiservice -l app.kubernetes.io/instance=keda
NAME SERVICE AVAILABLE AGE
v1beta1.external.metrics.k8s.io keda/keda-operator-metrics-apiserver True 22h
KEDA provides the metrics apiserver required by HPA to scale on custom metrics from Slurm. An alternative like Prometheus Adapter could be used for this, but KEDA offers usability enhancements and improvements to HPA in addition to including a metrics apiserver.
Autoscaling
Autoscaling NodeSets allows Slurm partitions to expand and contract in response to the CPU and memory usage. Using Slurm metrics, NodeSets may also scale based on Slurm specific information like the number of pending jobs or the size of the largest pending job in a partition. There are many ways to configure autoscaling. Experiment with different combinations based on the types of jobs being run and the resources available in the cluster.
NodeSet Scale Subresource
Scaling a resource in Kubernetes requires that resources such as Deployments and StatefulSets support the scale subresource. This is also true of the NodeSet Custom Resource.
The scale subresource gives a standard interface to observe and control the
number of replicas of a resource. In the case of NodeSet, it allows Kubernetes
and related services to control the number of slurmd
replicas running as part
of the NodeSet.
To manually scale a NodeSet, use the kubectl scale
command. In this example,
the NodeSet (nss) slurm-compute-radar
is scaled to 1.
$ kubectl scale -n slurm nss/slurm-compute-radar --replicas=1
nodeset.slinky.slurm.net/slurm-compute-radar scaled
$ kubectl get pods -o wide -n slurm -l app.kubernetes.io/instance=slurm-compute-radar
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
slurm-compute-radar-0 1/1 Running 0 2m48s 10.244.4.17 kind-worker <none> <none>
This corresponds to the Slurm partition radar
.
$ kubectl exec -n slurm statefulset/slurm-controller -- sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
radar up infinite 1 idle kind-worker
NodeSets may be scaled to zero. In this case, there are no replicas of slurmd
running and all jobs scheduled to that partition will remain in a pending state.
$ kubectl scale nss/slurm-compute-radar -n slurm --replicas=0
nodeset.slinky.slurm.net/slurm-compute-radar scaled
For NodeSets to scale on demand, an autoscaler needs to be deployed. KEDA allows resources to scale from 0<->1 and also creates an HPA to scale based on scalers like Prometheus and more.
KEDA ScaledObject
KEDA uses the Custom Resource ScaledObject to monitor and scale a resource. It will automatically create the HPA needed to scale based on external triggers like Prometheus. With Slurm metrics, NodeSets may be scaled based on data collected from the Slurm restapi.
This example ScaledObject will watch the number of jobs pending for the
partition radar
and scale the NodeSet slurm-compute-radar
until a threshold
value is satisfied or maxReplicaCount
is reached.
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: scale-radar
spec:
scaleTargetRef:
apiVersion: slinky.slurm.net/v1alpha1
kind: NodeSet
name: slurm-compute-radar
idleReplicaCount: 0
minReplicaCount: 1
maxReplicaCount: 3
triggers:
- type: prometheus
metricType: Value
metadata:
serverAddress: http://prometheus-kube-prometheus-prometheus.prometheus:9090
query: slurm_partition_pending_jobs{partition="radar"}
threshold: "5"
Note: The Prometheus trigger is using metricType: Value
instead of the
default AverageValue
. AverageValue
calculates the replica count by averaging
the threshold across the current replica count.
Check ScaledObject documentation for a full list of allowable options.
In this scenario, the ScaledObject scale-radar
will query the Slurm metric
slurm_partition_pending_jobs
from Prometheus with the label
partition="radar"
.
When there is activity on the trigger (at least one pending job), KEDA will
scale the NodeSet to minReplicaCount
and then let HPA handle scaling up to
maxReplicaCount
or back down to minReplicaCount
. When there is no activity
on the trigger after a configurable amount of time, KEDA will scale the NodeSet
to idleReplicaCount
. See the KEDA documentation on idleReplicaCount for
more examples.
Note: The only supported value for idleReplicaCount
is 0 due to
limitations on how the HPA controller works.
To verify a KEDA ScaledObject, apply it to the cluster in the appropriate namespace on a NodeSet that has no replicas.
$ kubectl scale nss/slurm-compute-radar -n slurm --replicas=0
nodeset.slinky.slurm.net/slurm-compute-radar scaled
Wait for Slurm to report that the partition has no nodes.
$ slurm@slurm-controller-0:/tmp$ sinfo -p radar
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
radar up infinite 0 n/a
Apply the ScaledObject using kubectl
to the correct namespace and verify the
KEDA and HPA resources are created.
$ kubectl apply -f scaledobject.yaml -n slurm
scaledobject.keda.sh/scale-radar created
$ kubectl get -n slurm scaledobjects
NAME SCALETARGETKIND SCALETARGETNAME MIN MAX TRIGGERS AUTHENTICATION READY ACTIVE FALLBACK PAUSED AGE
scale-radar slinky.slurm.net/v1alpha1.NodeSet slurm-compute-radar 1 5 prometheus True False Unknown Unknown 28s
$ kubectl get -n slurm hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
keda-hpa-scale-radar NodeSet/slurm-compute-radar <unknown>/5 1 5 0 32s
Once the ScaledObject and HPA are created, initiate some jobs to test that the
NodeSet
scale subresource is scaled in response.
$ sbatch --wrap "sleep 30" --partition radar --exclusive
The NodeSet will scale to minReplicaCount
in response to activity on the
trigger. Once the number of pending jobs crosses the configured threshold
(submit more exclusive jobs to the partition), more replicas will be created to
handle the additional demand. Until the threshold
is exceeded, the NodeSet
will remain at minReplicaCount
.
Note: This example only works well for single node jobs, unless threshold
is set to 1. In this case, HPA will continue to scale up NodeSet as long as
there is a pending job until up until it reaches the maxReplicaCount
.
After the default coolDownPeriod
of 5 minutes without activity on the trigger,
KEDA will scale the NodeSet down to 0.
4 - Cluster Controller
Overview
This controller is responsible for managing and reconciling the Cluster CRD. A
CRD represents communication to a Slurm cluster via slurmrestd and auth/jwt
.
This controller uses the Slurm client library.
Sequence Diagram
sequenceDiagram autonumber actor User as User participant KAPI as Kubernetes API participant CC as Cluster Controller box Operator Internals participant SCM as Slurm Client Map participant SEC as Slurm Event Channel end %% Operator Internals note over KAPI: Handle CR Creation User->>KAPI: Create Cluster CR KAPI-->>CC: Watch Cluster CRD CC->>+KAPI: Get referenced secret KAPI-->>-CC: Return secret create participant SC as Slurm Client CC->>+SC: Create Slurm Client for Cluster SC-->>-CC: Return Slurm Client Status loop Watch Slurm Nodes SC->>+SAPI: Get Slurm Nodes SAPI-->>-SC: Return Slurm Nodes SC->>SEC: Add Event for Cache Delta end %% loop Watch Slurm Nodes CC->>SCM: Add Slurm Client to Map CC->>+SC: Ping Slurm Control Plane SC->>+SAPI: Ping Slurm Control Plane SAPI-->>-SC: Return Ping SC-->>-CC: Return Ping CC->>KAPI: Update Cluster CR Status note over KAPI: Handle CR Deletion User->>KAPI: Delete Cluster CR KAPI-->>CC: Watch Cluster CRD SCM-->>CC: Lookup Slurm Client destroy SC CC-)SC: Shutdown Slurm Client CC->>SCM: Remove Slurm Client from Map participant SAPI as Slurm REST API
5 - Development
This document aims to provide enough information that you can get started with development on this project.
Getting Started
You will need a Kubernetes cluster to run against. You can use KIND to get a local cluster for testing, or run against your choice of remote cluster.
Note: Your controller will automatically use the current context in your
kubeconfig file (i.e. whatever cluster kubectl cluster-info
shows).
Dependencies
Install KIND and Golang binaries for pre-commit hooks.
sudo apt-get install golang
make install
Pre-Commit
Install pre-commit and install the git hooks.
sudo apt-get install pre-commit
pre-commit install
Docker
Install Docker and configure rootless Docker.
After, test that your user account and communicate with docker.
docker run hello-world
Helm
Install Helm.
sudo snap install helm --classic
Skaffold
Install Skaffold.
curl -Lo skaffold https://storage.googleapis.com/skaffold/releases/latest/skaffold-linux-amd64 && \
sudo install skaffold /usr/local/bin/
If google-cloud-sdk is installed, skaffold is available as an additional component.
sudo apt-get install -y google-cloud-cli-skaffold
Kubernetes Client
Install kubectl.
sudo snap install kubectl --classic
If google-cloud-sdk is installed, kubectl is available as an additional component.
sudo apt-get install -y kubectl
Running on the Cluster
For development, all Helm deployments use a values-dev.yaml
. If they do not
exist in your environment yet or you are unsure, safely copy the values.yaml
as a base by running:
make values-dev
Automatic
You can use Skaffold to build and push images, and deploy components using:
cd helm/slurm-operator/
skaffold run
NOTE: The skaffold.yaml
is configured to inject the image and tag into the
values-dev.yaml
so they are correctly referenced.
Operator
The slurm operator aims to follow the Kubernetes Operator pattern.
It uses Controllers, which provide a reconcile function responsible for synchronizing resources until the desired state is reached on the cluster.
Install CRDs
When deploying a helm chart with skaffold or helm, the CRDs defined in its
crds/
directory will be installed if not already present in the cluster.
Uninstall CRDs
To delete the Operator CRDs from the cluster:
make uninstall
WARNING: CRDs do not upgrade! The old ones must be uninstalled first so the new ones can be installed. This should only be done in development.
Modifying the API Definitions
If you are editing the API definitions, generate the manifests such as CRs or CRDs using:
make manifests
Slurm Version Changed
If the Slurm version has changed, generate the new OpenAPI spec and its golang client code using:
make generate
NOTE: Update code interacting with the API in accordance with the slurmrestd plugin lifecycle.
Running the operator locally
Install the operator’s CRDs with make install
.
Launch the operator via the VSCode debugger using the “Launch Operator” launch task.
Because the operator will be running outside of Kubernetes and needs to
communicate to the Slurm cluster, set the following options in you Slurm helm
chart’s values.yaml
:
debug.enable=true
debug.localOperator=true
If running on a Kind cluster, also set:
debug.disableCgroups=true
If the Slurm helm chart is being deployed with skaffold, run
skaffold run --port-forward --tail
. It is configured to automatically
port-forward the restapi for the local operator to
communicate with the Slurm cluster.
If skaffold is not used, manually run
kubectl port-forward --namespace slurm services/slurm-restapi 6820:6820
for
the local operator to communicate with the Slurm cluster.
After starting the operator, verify it is able to contact the Slurm cluster by checking that the Cluster CR has been marked ready:
$ kubectl get --namespace slurm clusters.slinky.slurm.net
NAME READY AGE
slurm true 110s
See skaffold port-forwarding to learn how skaffold automatically detects which services to forward.
Slurm Cluster
Get into a Slurm pod that can submit workload.
kubectl --namespace=slurm exec -it deployments/slurm-login -- bash -l
kubectl --namespace=slurm exec -it statefulsets/slurm-controller -- bash -l
cloud-provider-kind -enable-lb-port-mapping &
SLURM_LOGIN_PORT="$(kubectl --namespace=slurm get services -l app.kubernetes.io/name=login,app.kubernetes.io/instance=slurm -o jsonpath="{.items[0].status.loadBalancer.ingress[0].ports[0].port}")"
SLURM_LOGIN_IP="$(kubectl --namespace=slurm get services -l app.kubernetes.io/name=login,app.kubernetes.io/instance=slurm -o jsonpath="{.items[0].status.loadBalancer.ingress[0].ip}")"
ssh -p "$SLURM_LOGIN_PORT" "${USER}@${SLURM_LOGIN_IP}"
6 - NodeSet Controller
Overview
The nodeset controller is responsible for managing and reconciling the NodeSet CRD, which represents a set of homogeneous Slurm Nodes.
Design
This controller is responsible for managing and reconciling the NodeSet CRD. In addition to the regular responsibility of managing resources in Kubernetes via the Kubernetes API, this controller should take into consideration the state of Slurm to make certain reconciliation decisions.
Sequence Diagram
sequenceDiagram autonumber actor User as User participant KAPI as Kubernetes API participant NS as NodeSet Controller box Operator Internals participant SCM as Slurm Client Map participant SEC as Slurm Event Channel end %% Operator Internals participant SC as Slurm Client participant SAPI as Slurm REST API loop Watch Slurm Nodes SC->>+SAPI: Get Slurm Nodes SAPI-->>-SC: Return Slurm Nodes SC->>SEC: Add Event for Cache Delta end %% loop Watch Slurm Nodes note over KAPI: Handle CR Update SEC-->>NS: Watch Event Channel User->>KAPI: Update NodeSet CR KAPI-->>NS: Watch NodeSet CRD opt Scale-out Replicas NS->>KAPI: Create Pods end %% Scale-out Replicas opt Scale-in Replicas SCM-->>NS: Lookup Slurm Client NS->>+SC: Drain Slurm Node SC->>+SAPI: Drain Slurm Node SAPI-->>-SC: Return Drain Slurm Node Status SC-->>-NS: Drain Slurm Node alt Slurm Node is Drained NS->>KAPI: Delete Pod else NS->>NS: Check Again Later end %% alt Slurm Node is Drained end %% opt Scale-in Replicas
7 - Pyxis
Overview
This guide tells how to configure your Slurm cluster to use pyxis (and enroot), a Slurm SPANK plugin for containerized jobs with Nvidia GPU support.
Configure
Configure plugstack.conf
to include the pyxis configuration.
Warning: In
plugstack.conf
, you must use glob syntax to avoid slurmctld failure while trying to resolve the paths in the includes. Only the login and slurmd pods should actually have the pyxis libraries installed.
slurm:
configFiles:
plugstack.conf: |
include /usr/share/pyxis/*
...
Configure one or more NodeSets and the login pods to use a pyxis OCI image.
login:
image:
repository: ghcr.io/slinkyproject/login-pyxis
...
compute:
nodesets:
- name: debug
image:
repository: ghcr.io/slinkyproject/slurmd-pyxis
...
To make enroot activity in the login container permissible, it requires
securityContext.privileged=true
.
login:
image:
repository: ghcr.io/slinkyproject/login-pyxis
securityContext:
privileged: true
Test
Submit a job to a Slurm node.
$ srun --partition=debug grep PRETTY /etc/os-release
PRETTY_NAME="Ubuntu 24.04.2 LTS"
Submit a job to a Slurm node with pyxis and it will launch in its requested container.
$ srun --partition=debug --container-image=alpine:latest grep PRETTY /etc/os-release
pyxis: importing docker image: alpine:latest
pyxis: imported docker image: alpine:latest
PRETTY_NAME="Alpine Linux v3.21"
Warning: SPANK plugins will only work on specific Slurm node that have them and is configured to use them. It is best to constrain where jobs run with
--partition=<partition>
,--batch=<features>
, and/or--constraint=<features>
to ensure a compatible computing environment.
If the login container has securityContext.privileged=true
, enroot activity is
permissible. You can test the functionality with the following:
enroot import docker://alpine:latest
8 - Slurm
Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. Slurm requires no kernel modifications for its operation and is relatively self-contained. As a cluster workload manager, Slurm has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work. Optional plugins can be used for accounting, advanced reservation, gang scheduling (time sharing for parallel jobs), backfill scheduling, topology optimized resource selection, resource limits by user or bank account, and sophisticated multifactor job prioritization algorithms.
Architecture
See the Slurm architecture docs for more information.
9 - Quickstart Guides
9.1 - QuickStart Guide for Amazon EKS
This quickstart guide will help you get the slurm-operator running and deploy Slurm clusters to Amazon EKS.
Setup
Setup a cluster on EKS.
eksctl create cluster \
--name slinky-cluster \
--region us-west-2 \
--nodegroup-name slinky-nodes \
--node-type t3.medium \
--nodes 2
Setup kubectl to point to your new cluster.
aws eks --region us-west-2 update-kubeconfig --name slinky-cluster
Pre-Requisites
Install the pre-requisite helm charts.
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add metrics-server https://kubernetes-sigs.github.io/metrics-server/
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install cert-manager jetstack/cert-manager \
--namespace cert-manager --create-namespace --set crds.enabled=true
helm install prometheus prometheus-community/kube-prometheus-stack \
--namespace prometheus --create-namespace --set installCRDs=true
Install EBS CSI Driver
helm install aws-ebs-csi aws-ebs-csi-driver/aws-ebs-csi-driver -n kube-system
AWS Permissions
You will need to make sure your IAM user has the proper permissions.
Step 1: Identify the IAM Role
Run the following AWS CLI command to get the IAM role attached to your EKS worker nodes:
aws eks describe-nodegroup \
--cluster-name slinky-cluster \
--nodegroup-name slinky-nodes \
--query "nodegroup.nodeRole" \
--output text
This will return something like:
arn:aws:iam::017820679962:role/eksctl-slurm-cluster-nodegroup-my-nod-NodeInstanceRole-hpbRU4WRvvlK
Step 2: Attach the Required IAM Policy for EBS CSI Driver
Attach the AmazonEBSCSIDriverPolicy managed IAM policy to this role.
Run the following command:
aws iam attach-role-policy \
--role-name eksctl-slurm-cluster-nodegroup-my-nod-NodeInstanceRole-hpbRU4WRvvlK \
--policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy
Create StorageClass
You will need to create a StorageClass to use.
Here is an example storageclass.yaml file for a StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: standard
provisioner: ebs.csi.aws.com
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
parameters:
type: gp2
fsType: ext4
Create the StorageClass using your storageclass.yaml file.
kubectl apply -f storageclass.yaml
Slurm Operator
Download values and install the slurm-operator. You will need to update the operator and webhook repository values to point to the desired container repository.
curl -L https://raw.githubusercontent.com/SlinkyProject/slurm-operator/refs/tags/v0.3.0/helm/slurm-operator/values.yaml \
-o values-operator.yaml
helm install slurm-operator oci://ghcr.io/slinkyproject/charts/slurm-operator \
--values=values-operator.yaml --version=0.3.0 --namespace=slinky --create-namespace
Make sure the cluster deployed successfully with:
kubectl --namespace=slinky get pods
Output should be similar to:
NAME READY STATUS RESTARTS AGE
slurm-operator-7444c844d5-dpr5h 1/1 Running 0 5m00s
slurm-operator-webhook-6fd8d7857d-zcvqh 1/1 Running 0 5m00s
Slurm Cluster
Download values and install a Slurm cluster.
curl -L https://raw.githubusercontent.com/SlinkyProject/slurm-operator/refs/tags/v0.3.0/helm/slurm/values.yaml \
-o values-slurm.yaml
helm install slurm oci://ghcr.io/slinkyproject/charts/slurm \
--values=values-slurm.yaml --version=0.3.0 --namespace=slurm --create-namespace
Make sure the slurm cluster deployed successfully with:
kubectl --namespace=slurm get pods
Output should be similar to:
NAME READY STATUS RESTARTS AGE
slurm-accounting-0 1/1 Running 0 5m00s
slurm-compute-debug-0 1/1 Running 0 5m00s
slurm-controller-0 2/2 Running 0 5m00s
slurm-exporter-7b44b6d856-d86q5 1/1 Running 0 5m00s
slurm-mariadb-0 1/1 Running 0 5m00s
slurm-restapi-5f75db85d9-67gpl 1/1 Running 0 5m00s
Testing
To test Slurm functionality, connect to the controller to use Slurm client commands:
kubectl --namespace=slurm exec \
-it statefulsets/slurm-controller -- bash --login
On the controller pod (e.g. host slurm@slurm-controller-0), run the following commands to quickly test Slurm is functioning:
sinfo
srun hostname
sbatch --wrap="sleep 60"
squeue
See Slurm Commands for more details on how to interact with Slurm.
9.2 - QuickStart Guide for Google GKE
This quickstart guide will help you get the slurm-operator running and deploy Slurm clusters to GKE.
Setup
Setup a cluster on GKE.
gcloud container clusters create slinky-cluster \
--location=us-central1-a \
--num-nodes=2 \
--node-taints "" \
--machine-type=c2-standard-16
Setup kubectl to point to your new cluster.
gcloud container clusters get-credentials slinky-cluster
Pre-Requisites
Install the pre-requisite helm charts.
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add metrics-server https://kubernetes-sigs.github.io/metrics-server/
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install cert-manager jetstack/cert-manager \
--namespace cert-manager --create-namespace --set crds.enabled=true
helm install prometheus prometheus-community/kube-prometheus-stack \
--namespace prometheus --create-namespace --set installCRDs=true
Slurm Operator
Download values and install the slurm-operator. You will need to update the operator and webhook repository values to point to the desired container repository.
curl -L https://raw.githubusercontent.com/SlinkyProject/slurm-operator/refs/tags/v0.3.0/helm/slurm-operator/values.yaml \
-o values-operator.yaml
helm install slurm-operator oci://ghcr.io/slinkyproject/charts/slurm-operator \
--values=values-operator.yaml --version=0.3.0 --namespace=slinky --create-namespace
Make sure the cluster deployed successfully with:
kubectl --namespace=slinky get pods
Output should be similar to:
NAME READY STATUS RESTARTS AGE
slurm-operator-7444c844d5-dpr5h 1/1 Running 0 5m00s
slurm-operator-webhook-6fd8d7857d-zcvqh 1/1 Running 0 5m00s
Slurm Cluster
Download values and install a Slurm cluster.
curl -L https://raw.githubusercontent.com/SlinkyProject/slurm-operator/refs/tags/v0.3.0/helm/slurm/values.yaml \
-o values-slurm.yaml
helm install slurm oci://ghcr.io/slinkyproject/charts/slurm \
--values=values-slurm.yaml --version=0.3.0 --namespace=slurm --create-namespace
Make sure the slurm cluster deployed successfully with:
kubectl --namespace=slurm get pods
Output should be similar to:
NAME READY STATUS RESTARTS AGE
slurm-accounting-0 1/1 Running 0 5m00s
slurm-compute-debug-0 1/1 Running 0 5m00s
slurm-controller-0 2/2 Running 0 5m00s
slurm-exporter-7b44b6d856-d86q5 1/1 Running 0 5m00s
slurm-mariadb-0 1/1 Running 0 5m00s
slurm-restapi-5f75db85d9-67gpl 1/1 Running 0 5m00s
Testing
To test Slurm functionality, connect to the controller to use Slurm client commands:
kubectl --namespace=slurm exec \
-it statefulsets/slurm-controller -- bash --login
On the controller pod (e.g. host slurm@slurm-controller-0), run the following commands to quickly test Slurm is functioning:
sinfo
srun hostname
sbatch --wrap="sleep 60"
squeue
See Slurm Commands for more details on how to interact with Slurm.
9.3 - QuickStart Guide for Microsoft AKS
This quickstart guide will help you get the slurm-operator running and deploy Slurm clusters to AKS.
Setup
Setup a resource group on AKS
az group create --name slinky --location westus2
Setup a cluster on AKS
az aks create \
--resource-group slinky \
--name slinky \
--location westus2 \
--node-vm-size Standard_D2s_v3
Setup kubectl to point to your new cluster.
az aks get-credentials --resource-group slinky --name slinky
Pre-Requisites
Install the pre-requisite helm charts.
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add metrics-server https://kubernetes-sigs.github.io/metrics-server/
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install cert-manager jetstack/cert-manager \
--namespace cert-manager --create-namespace --set crds.enabled=true
helm install prometheus prometheus-community/kube-prometheus-stack \
--namespace prometheus --create-namespace --set installCRDs=true
Slurm Operator
Download values and install the slurm-operator. You will need to update the operator and webhook repository values to point to the desired container repository.
curl -L https://raw.githubusercontent.com/SlinkyProject/slurm-operator/refs/tags/v0.3.0/helm/slurm-operator/values.yaml \
-o values-operator.yaml
Make sure you are authenticated and the proper role is assigned to pull your images.
az acr login -n slinky
az aks show \
--resource-group slinky \
--name slinky \
--query identityProfile.kubeletidentity.clientId \
-o tsv
az role assignment create --assignee <clientId from above> \
--role AcrPull \
--scope $(az acr show --name slinky --query id -o tsv)
helm install slurm-operator oci://ghcr.io/slinkyproject/charts/slurm-operator \
--values=values-operator.yaml --version=0.3.0 --namespace=slinky --create-namespace
Make sure the cluster deployed successfully with:
kubectl --namespace=slinky get pods
Output should be similar to:
NAME READY STATUS RESTARTS AGE
slurm-operator-7444c844d5-dpr5h 1/1 Running 0 5m00s
slurm-operator-webhook-6fd8d7857d-zcvqh 1/1 Running 0 5m00s
Slurm Cluster
Download values and install a Slurm cluster.
curl -L https://raw.githubusercontent.com/SlinkyProject/slurm-operator/refs/tags/v0.3.0/helm/slurm/values.yaml \
-o values-slurm.yaml
By default the values-slurm.yaml file uses standard
for controller.persistence.storageClass
and mariadb.primary.persistence.storageClass
. You will need to update this value to default
to use AKS’s default storageClass.
helm install slurm oci://ghcr.io/slinkyproject/charts/slurm \
--values=values-slurm.yaml --version=0.3.0 --namespace=slurm --create-namespace
Make sure the slurm cluster deployed successfully with:
kubectl --namespace=slurm get pods
Output should be similar to:
NAME READY STATUS RESTARTS AGE
slurm-accounting-0 1/1 Running 0 5m00s
slurm-compute-debug-0 1/1 Running 0 5m00s
slurm-controller-0 2/2 Running 0 5m00s
slurm-exporter-7b44b6d856-d86q5 1/1 Running 0 5m00s
slurm-mariadb-0 1/1 Running 0 5m00s
slurm-restapi-5f75db85d9-67gpl 1/1 Running 0 5m00s
Testing
To test Slurm functionality, connect to the controller to use Slurm client commands:
kubectl --namespace=slurm exec \
-it statefulsets/slurm-controller -- bash --login
On the controller pod (e.g. host slurm@slurm-controller-0), run the following commands to quickly test Slurm is functioning:
sinfo
srun hostname
sbatch --wrap="sleep 60"
squeue
See Slurm Commands for more details on how to interact with Slurm.