GPU Job Example

#!/bin/bash

# Name of the job
#SBATCH --job-name=gpu_job

# Number of compute nodes
#SBATCH --nodes=1

# Number of cores, in this case one
#SBATCH --ntasks-per-node=1

# Request the GPU partition
#SBATCH --partition gpuq

# Request the GPU resources
#SBATCH --gres=gpu:2

# Walltime (job duration)
#SBATCH --time=00:15:00

# Email notifications
#SBATCH --mail-type=BEGIN,END,FAIL

nvidia-smi
echo $CUDA_VISIBLE_DEVICES
hostname

After submitting the job via sbatch, the output file contains the requested resources as shown by the nvidia-smi command and from the output of $CUDA_VISIBLE_DEVICES

!/bin/bash
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:18:00.0 Off |                    0 |
| N/A   33C    P0    39W / 300W |      0MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  On   | 00000000:3B:00.0 Off |                    0 |
| N/A   32C    P0    40W / 300W |      0MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
0, 1
p04.hpcc.dartmouth.edu

Your program needs to know which GPU it has been assigned and the submission template above used $CUDA_VISIBLE_DEVICES to determine which GPU number your job has been assigned. You should pass the GPU number to your program as a command line argument and then set the default GPU in your code.

./program_name $CUDA_VISIBLE_DEVICES

Available GPU types can be found with the command sinfo -O gres -p <partition>. GPUs can be requested in both Batch and Interactive jobs.

$ sinfo -O gres -p gpuq   
GRES
gpu:nvidia_a100_80gb

sinfo -O gres -p a5500
GRES
gpu:nvidia_rtx_a5500

0 reviews

Print Article

Updating...

GPU Job Example

Deleting...