Installation Guide
Table of Contents
Overview
Installation instructions for the Slurm Operator on Kubernetes.
Slurm Operator And CRDs
Install the cert-manager with its CRDs, if not already installed:
helm install cert-manager oci://quay.io/jetstack/charts/cert-manager \
--set 'crds.enabled=true' \
--namespace cert-manager --create-namespace
Install the slurm-operator and its CRDs:
helm install slurm-operator-crds oci://ghcr.io/slinkyproject/charts/slurm-operator-crds
helm install slurm-operator oci://ghcr.io/slinkyproject/charts/slurm-operator \
--namespace=slinky --create-namespace
Check if the slurm-operator deployed successfully:
kubectl --namespace=slinky get pods --selector='app.kubernetes.io/instance=slurm-operator'
The output should be similar to:
NAME READY STATUS RESTARTS AGE
slurm-operator-5d86d75979-6wflf 1/1 Running 0 1m
slurm-operator-webhook-567c84547b-kr7zq 1/1 Running 0 1m
With CRDs As Subchart
If you intend to manage the slurm-operator and the CRDs in the same helm
release, install it with the --set 'crds.enabled=true' argument.
helm install slurm-operator oci://ghcr.io/slinkyproject/charts/slurm-operator \
--set 'crds.enabled=true' \
--namespace=slinky --create-namespace
Without cert-manager
If the cert-manager is not installed, then install the chart with the
--set 'certManager.enabled=false' argument, to avoid signing certificates via
cert-manager.
helm install slurm-operator oci://ghcr.io/slinkyproject/charts/slurm-operator \
--set 'certManager.enabled=false' \
--namespace=slinky --create-namespace
Slurm Cluster
Install a Slurm cluster via helm chart:
helm install slurm oci://ghcr.io/slinkyproject/charts/slurm \
--namespace=slurm --create-namespace
Check if the Slurm cluster deployed successfully:
kubectl --namespace=slurm get pods
The output should be similar to:
NAME READY STATUS RESTARTS AGE
slurm-accounting-0 1/1 Running 0 2m
slurm-controller-0 3/3 Running 0 2m
slurm-exporter-6ffb9fdbbd-547zj 1/1 Running 0 2m
slurm-login-slinky-7ff66445b5-wdjkn 1/1 Running 0 2m
slurm-restapi-77b9f969f7-kh4r8 1/1 Running 0 2m
slurm-worker-slinky-0 2/2 Running 0 2m
With Accounting
You will need to configure Slurm accounting to point at a database. There are multiple methods to provide a database for Slurm.
Either use:
the mariadb-operator
the mysql-operator
any Slurm compatible database
mysql/mariadb compatible alternatives
managed cloud database service
Mariadb (Community Edition)
If you intend to enable accounting, install the mariadb-operator and its CRDs, if not already installed:
helm repo add mariadb-operator https://helm.mariadb.com/mariadb-operator
helm repo update
helm install mariadb-operator-crds mariadb-operator/mariadb-operator-crds
helm install mariadb-operator mariadb-operator/mariadb-operator \
--namespace mariadb --create-namespace
Create the slurm namespace.
kubectl create namespace slurm
Create a mariadb database via CR.
kubectl apply -f - <<EOF
apiVersion: k8s.mariadb.com/v1alpha1
kind: MariaDB
metadata:
name: mariadb
namespace: slurm
spec:
rootPasswordSecretKeyRef:
name: mariadb-root
key: password
generate: true
username: slurm
database: slurm_acct_db
passwordSecretKeyRef:
name: mariadb-password
key: password
generate: true
storage:
size: 1Gi
myCnf: |
[mariadb]
bind-address=*
default_storage_engine=InnoDB
binlog_format=row
innodb_autoinc_lock_mode=2
innodb_buffer_pool_size=4096M
innodb_lock_wait_timeout=900
innodb_log_file_size=1024M
max_allowed_packet=256M
EOF
Note
The mariadb database example above aligns with the Slurm chart’s default
accounting.storageConfig. If your actual database configuration is different, then you will have to update theaccounting.storageConfigto work with your configuration.
Then install a Slurm cluster via helm chart with the
--set 'accounting.enabled=true' argument.
helm install slurm oci://ghcr.io/slinkyproject/charts/slurm \
--set 'accounting.enabled=true' \
--namespace=slurm --create-namespace
With Metrics
If you intend to collect metrics, install prometheus and its CRDs, if not already installed:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus prometheus-community/kube-prometheus-stack \
--set 'installCRDs=true' \
--namespace prometheus --create-namespace
Then install a Slurm cluster via helm chart with the
--set 'slurm-exporter.enabled=true' argument.
helm install slurm oci://ghcr.io/slinkyproject/charts/slurm \
--set 'slurm-exporter.enabled=true' \
--namespace=slurm --create-namespace
With Login
You will need to configure the Slurm chart such that the login pods can communicate with an identity service via sssd.
Warning
In this example, you will need to supply an
sssd.conf(at~/sssd.conf) that is configured for your environment.
Install a Slurm cluster via helm chart with the
--set 'loginsets[slinky].enabled=true' and
--set-file "loginsets[slinky].sssdConf=~/sssd.conf" arguments.
helm install slurm oci://ghcr.io/slinkyproject/charts/slurm \
--set 'loginsets[slinky].enabled=true' \
--set-file "loginsets[slinky].sssdConf=~/sssd.conf" \
--namespace=slurm --create-namespace
Testing Slurm
SSH through the login service:
SLURM_LOGIN_IP="$(kubectl get services -n slurm slurm-login-slinky -o jsonpath='{.status.loadBalancer.ingress[0].ip}')"
SLURM_LOGIN_PORT="$(kubectl get services -n slurm slurm-login-slinky -o jsonpath='{.status.loadBalancer.ingress[0].ports[0].port}')"
## Assuming your public SSH key was configured in `loginsets[slinky].rootSshAuthorizedKeys`.
ssh -p ${SLURM_LOGIN_PORT:-22} root@${SLURM_LOGIN_IP}
## Assuming SSSD was configured correctly.
ssh -p ${SLURM_LOGIN_PORT:-22} ${USER}@${SLURM_LOGIN_IP}
Then, from a login pod, run Slurm commands to quickly test that Slurm is functioning:
sinfo
srun hostname
sbatch --wrap="sleep 60"
squeue
sacct
See Slurm Commands for more details on how to interact with Slurm.