#!/bin/bash
# Name of the job
#SBATCH --job-name=gpu_job
# Number of compute nodes
#SBATCH --nodes=1
# Number of cores, in this case one
#SBATCH --ntasks-per-node=1
# Request the GPU partition
#SBATCH --partition gpuq
# Request the GPU resources
#SBATCH --gres=gpu:2
# Walltime (job duration)
#SBATCH --time=00:15:00
# Email notifications
#SBATCH --mail-type=BEGIN,END,FAIL
nvidia-smi
echo $CUDA_VISIBLE_DEVICES
hostname
After submitting the job via sbatch, the output file contains the requested resources as shown by the nvidia-smi command and from the output of $CUDA_VISIBLE_DEVICES
!/bin/bash
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... On | 00000000:18:00.0 Off | 0 |
| N/A 33C P0 39W / 300W | 0MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2... On | 00000000:3B:00.0 Off | 0 |
| N/A 32C P0 40W / 300W | 0MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
0, 1
p04.hpcc.dartmouth.edu
Your program needs to know which GPU it has been assigned and the submission template above used $CUDA_VISIBLE_DEVICES to determine which GPU number your job has been assigned. You should pass the GPU number to your program as a command line argument and then set the default GPU in your code.
./program_name $CUDA_VISIBLE_DEVICES
Available GPU types can be found with the command sinfo -O gres -p <partition>. GPUs can be requested in both Batch and Interactive jobs.
$ sinfo -O gres -p gpuq
GRES
gpu:nvidia_a100_80gb
sinfo -O gres -p a5500
GRES
gpu:nvidia_rtx_a5500