There are many new Slurm commands available on the Discovery cluster.
Common user commands in Slurm include:
sbatch |
sbatch <job script> |
submit a batch job to the queue |
squeue |
squeue |
show status of Slurm batch jobs |
srun |
srun <job script> |
run interactive job |
sinfo |
sinfo |
show information about partitions |
scontrol |
scontrol show job <JOBID> |
used to check the status of a running, or idle job |
scancel |
scancel <JOBID> |
cancel job |
Batch jobs
To run a job in batch mode, first prepare a job script that specifies the application you want to launch and the resources required to run it. Then, use the sbatch
command to submit your job script to Slurm.
For complete documentation about the sbatch
command and its options, see the sbatch
manual page via: man sbatch
Example submit script:
Slurm job scripts most commonly have at least one executable line preceded by a list of options that specify the resources and attributes needed to run your job (for example, wall-clock time, the number of nodes and processors, and filenames for job output and errors).
- A job script for running a batch job on Discovery may look similar to the following:
#!/bin/bash
# Name of the job
#SBATCH --job-name=my_first_slurm_job
# Number of compute nodes
#SBATCH --nodes=1
# Number of tasks per node
#SBATCH --ntasks-per-node=1
# Number of CPUs per task
#SBATCH --cpus-per-task=1
# Request memory
#SBATCH --mem=8G
# Walltime (job duration)
#SBATCH --time=00:15:00
# Email notifications (comma-separated options: BEGIN,END,FAIL)
#SBATCH --mail-type=FAIL
module load module_name
./my_program arg1 arg2
In the above example:
- The first line indicates that the script should be read using the Bash command interpreter.
- The next lines are
#SBATCH
directives used to pass options to the sbatch
command:
--job_name
specifies a name for the job allocation. The specified name will appear along with the job ID number when you query running jobs on the system.
-o filename_%j.txt
and -e filename_%j.err
instructs Slurm to connect the job's standard output and standard error, respectively, to the file names specified, where %j
is automatically replaced by the job ID.
--mail-type=<type>
directs Slurm to send job-related email when an event of the specified type(s) occurs; valid type
values include all
, begin
, end
, and fail
.
--nodes=1
requests one node be allocated to this job.
--ntasks-per-node=1
specifies that one task should be launched per node.
--cpus-per-task=1
specifies that one CPU should be allocated per task.
--mem=8G
requests 8 GB of memory.
--time=00:15:00
requests 15 minutes.
- The last two lines are the two executable lines that the job will run. In this case, the
module
command is used to load a specified module before launching the specified binary (my_program
) with the specified arguments (my_program_arguments
). In your script, replace my_program
and arguments
with your program's name and any needed arguments.
- A job script for running a batch job on the gpu nodes should contain
--partition gpuq
and the --gres
flag to indicate the type of GPU (k80
or v100
) and the number of GPUs (1
or 4) to be allocated for the job. For example:
#!/bin/bash
#SBATCH -J job_name
#SBATCH --partition gpuq
#SBATCH --gres=gpu:k80:2
#SBATCH -o filename_%j.txt
#SBATCH -e filename_%j.err
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --time=02:00:00
module load module_name
./my_program my_program_arguments
In your script, replace my_program
and my_program_arguments
with your program's name and any needed arguments.
Depending on the resources needed to run your executable lines, you may need to include other sbatch
options in your job script. Here a few other useful ones:
Option |
Action |
--begin=YYYY-MM-DDTHH:MM:SS |
Defer allocation of your job until the specified date and time, after which the job is eligible to execute. For example, to defer allocation of your job until 10:30pm June 14, 2021, use:
--begin=2021-06-14T22:30:00
|
--no-requeue |
Specify that the job is not rerunnable. Setting this option prevents the job from being requeued after it has been interrupted, for example, by a scheduled downtime or preemption by a higher priority job. |
--export=ALL |
Export all environment variables in the sbatch command's environment to the batch job. |
Submit your job script
To submit your job script (for example, my_job.script
), use the sbatch
command. If the command runs successfully, it will return a job ID to standard output; for example, Discovery:
$ sbatch my_job.script
Submitted batch job 4311
MPI jobs
To run an MPI job, add #SBATCH
directives to your script for requesting the required resources and add the srun
command as an executable line for launching your application. For example, a job script for running an MPI job that launches 96 tasks across two nodes in the general partition on discovery could look similar to the following:
#!/bin/bash
#SBATCH -J mpi_job
#SBATCH -o mpi_%j.txt
#SBATCH -e mpi_%j.err
#SBATCH --mail-type=END,FAIL
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=48
#SBATCH --time=00:30:00
cd /directory/with/stuff
srun my_program my_program_arguments
In your script, replace my_program
and my_program_arguments
with your program's name and any needed arguments.
OpenMP and hybrid OpenMP-MPI jobs
To run an OpenMP or hybrid OpenMP-MPI job, use the srun
command and add the necessary #SBATCH
directives as in the previous example, but also add an executable line that sets the OMP_NUM_THREADS
environment variable to indicate the number of threads that should be used for parallel regions. For example, a job script for running a hybrid OpenMP-MPI job that launches 16 tasks across two nodes in the standard partition on discovery could look similar to the following:
#!/bin/bash
#SBATCH -J hybrid_job
#SBATCH -o hybrid_%j.txt
#SBATCH -e hybrid_%j.err
#SBATCH --mail-type=ALL
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=16
#SBATCH --time=00:05:00
export OMP_NUM_THREADS=2
cd /directory/with/stuff
srun my_program my_program_arguments
In your script, replace my_program
and my_program_arguments
with your program's name and any needed arguments.
You also can bind tasks to CPUs with the srun
command's --cpu-bind
option. For example, to modify the previous example so that it binds tasks to sockets, add the --cpu-bind=sockets
option to the srun
command:
#!/bin/bash
#SBATCH -J hybrid_job
#SBATCH -o hybrid_%j.txt
#SBATCH -e hybrid_%j.err
#SBATCH --mail-type=ALL
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=16
#SBATCH --cpus-per-task=8
#SBATCH --time=00:05:00
export OMP_NUM_THREADS=2
cd /directory/with/stuff
srun --cpu-bind=sockets my_program my_program_arguments
In your script, replace my_program
and my_program_arguments
with your program's name and any needed arguments.
Supported binding options include --cpu-bind=mask_cpu:<list>
, which binds by setting CPU masks on tasks as indicated in the specified list. To view all available CPU bind options, on the discovery command line, enter:
$ srun --cpu-bind=help
Interactive jobs
To request resources for an interactive job, use the srun
command with the --pty
option.
For example,
$ srun --pty /bin/bash
$ hostname
p04.hpcc.dartmouth.edu
$
Jobs submitted with srun –pty /bin/bash will be assigned the cluster default values of 1 CPU and 1024MB of memory. The account must also be specified else the job will not run otherwise. If additional resources are required, they can be requested as options to the srun command. The following example job is assigned 2 nodes with 2 CPUS and 4GB of memory each:
$ srun --nodes=2 --ntasks-per-node=4 --mem-per-cpu=1GB --cpus-per-task=1 --pty /bin/bash
[q06 ~]$
When the requested resources are allocated to your job, you will be placed at the command prompt within a cluster compute node. Once you are placed on a compute node, you can begin execute your code interactively
Note:
When you are finished with your interactive session, on the command line, enter exit
to free the allocated resources.
For complete documentation about the srun
command, see the srun
manual page via: man srun
Monitor or delete your job
To monitor the status of jobs in a Slurm partition, use the squeue
command. Some useful squeue
options include:
-a Display information for all jobs.
-j <jobid> Display information for the specified job ID.
-j <jobid> -o %all Display all information fields (with a vertical bar separating each field) for the specified job ID.
-l Display information in long format.
-n <job_name> Display information for the specified job name.
-p <partition_name> Display jobs in the specified partition.
-t <state_list> Display jobs that have the specified state(s). Valid jobs states include PENDING, RUNNING, SUSPENDED, COMPLETED, CANCELLED, FAILED, TIMEOUT, NODE_FAIL, PREEMPTED, BOOT_FAIL, DEADLINE, OUT_OF_MEMORY, COMPLETING, CONFIGURING, RESIZING, REVOKED, and SPECIAL_EXIT.
-u <username> Display jobs owned by the specified user.
For complete documentation about the squeue
command, see the squeue
manual page.
To delete your pending or running job, use the scancel
command with your job's job ID; for example, to delete your job that has a job ID of 4632
, on the command line, enter:
$ scancel 4632
Alternatively:
For complete documentation about the scancel
command, see the scancel
manual page via: man scancel
View partition and compute node information
To view information about the nodes and partitions that Slurm manages, use the sinfo
command.
By default, sinfo
(without any options) displays:
- All partition names
- Availability of each partition
- Maximum wall time allowed for jobs in each partition
- Number of compute nodes in each partition
- State of the compute nodes in each partition
- Names of the compute nodes in each partition
To display node-specific information, use sinfo -N
, which will list:
- All node names
- Partition to which each node belongs
- State of each node
To display additional node-specific information, use sinfo -lN
, which adds the following fields to the previous output:
- Number of cores per node
- Number of sockets per node, cores per socket, and threads per core
- Size of memory per node in megabytes
Specification |
Field displayed |
%<#>P |
Partition name (set field width to # characters) |
%<#>N |
List of node names (set field width to # characters) |
%<#>c |
Number of cores per node (set field width to # characters) |
%<#>m |
Size of memory per node in megabytes (set field width to # characters) |
%<#>l |
Maximum wall time allowed (set field width to # characters) |
%<#>s |
Maximum number of nodes allowed per job (set field width to # characters) |
%<#>G |
Generic resource associated with a node (set field width to # characters) |
$ sinfo -No "%10P %8N %4c %7m %12l %10G"
The resulting output looks similar to this:
PARTITION NODELIST CPUS MEMORY TIMELIMIT GRES
gpuq g08 16 128640 infinite gpu:k80:4(
gpuq g10 16 128640 infinite gpu:k80:4(
gpuq g11 16 128640 infinite gpu:k80:4(
bigmem k25 16 64132 infinite (null)
bigmem k26 16 64132 infinite (null)
bigmem k27 16 64132 infinite (null)
bigmem k28 16 64132 infinite (null)
bigmem k29 16 64132 infinite (null)
bigmem k30 16 64132 infinite (null)
bigmem k31 16 64132 infinite (null)
bigmem k32 16 64132 infinite (null)
bigmem k33 16 64132 infinite (null)
bigmem k34 16 64132 infinite (null)
bigmem k35 16 64132 infinite (null)
bigmem k36 16 64132 infinite (null)
bigmem k37 16 64132 infinite (null)
bigmem k38 16 64132 infinite (null)
bigmem k39 16 64132 infinite (null)
bigmem k40 16 64132 infinite (null)
bigmem k41 16 64132 infinite (null)
For complete documentation about the sinfo
command, see the sinfo
manual page via: man sinfo
Credit https://kb.iu.edu/d/awrz