Using SR-IOV with Slurm-operator
Table of Contents
Overview
Single-root input/output virtualization (SR-IOV) is a technology that allows a physical PCIe device to present itself as multiple discrete devices. SR-IOV exposes “Virtual Functions” (VFs), which can be seen as additional devices on the PCI bus. These VFs can be attached to virtual machines and containers, allowing direct hardware access to network resources. SR-IOV has been shown to greatly improve network performance on virtualized systems, and is a key enabling technology for cloud-native HPC.
For more information on SR-IOV’s performance implications, see:
Pre-requisites
Neither of the deployment methods outlined in this document, nor their dependencies, have the capability to enable, manage, and create VFs on the hardware level. Configuration and creation of VFs must be conducted manually, prior to attempting these methods.
Warning
As VF creation is ephemeral, VFs must be re-created on each system reboot.
For more information on configuring SR-IOV on Intel and Mellanox hardware, see:
Before attempting either deployment method below, enable SR-IOV on your
clusters’ nodes, and create VFs. When done successfully, Virtual Functions will
be visible in the output of lspci.
$ lspci | grep "Virtual Function"
01:00.2 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
01:00.3 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
01:00.4 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
01:00.5 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
01:00.6 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
01:00.7 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
01:01.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
01:01.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
Deployment Methods
Using NVIDIA Network Operator
The NVIDIA Network Operator provides the simplest deployment path for using SR-IOV with slurm-operator. During the installation of the NVIDIA Network Operator, the SR-IOV Network Operator can be deployed using a chart parameter.
DRA Driver for SR-IOV Virtual Functions
Dynamic Resource Allocation is a Kubernetes feature that provides a flexible way to categorize, request, and use devices in a cluster. dra-driver-sriov provides a DRA driver that enables workloads in Kubernetes to request and utilize SR-IOV VFs through the native resource allocation system.
Slurm-operator has been proven to be functional with the SR-IOV DRA Driver. Please refer to the installation guide for instructions for the deployment of dra-driver-sriov. Please note that the installation and configuration of these tools is complex and highly site-specific.
Prior to attempting installation of dra-driver-sriov, one must:
Install a compatible CNI meta-plugin (reference)
Create a SR-IOV CRD for that CNI (reference)
Install sriov-cni
Install sriov-network-device-plugin
Tip
If a Slinky NodeSet pod with an SR-IOV network interface gets stuck in
CrashLoopBackOff with logs indicating a failure to contact the slurm
controller, you will need to modify the sriov-crd that you created or
configure the order of routes in your slurm-operator pod. The routes or
gateway field of the IPAM section of the SR-IOV CRD has likely caused the
SR-IOV device to be set as the default route for the slurm-worker-* pod,
which is causing issues resolving the slurm-controller (which should be done
on the Kubernetes internal network).