Sample Slurm Script to Submit a Single Processor Job.
Create a script file that includes the details of the job that you want to run.
It can include the name of the program, the memory, wall time and processor requirements of the job, which queue it should run in and how to notify you of the results of the job.
Here is an example submit script.
#!/bin/bash
# Name of the job
#SBATCH --job-name=multicore_job
# Number of compute nodes
#SBATCH --nodes=1
# Number of cores, in this case one
#SBATCH --ntasks-per-node=1
# Walltime (job duration)
#SBATCH --time=00:15:00
# Email notifications
#SBATCH --mail-type=BEGIN,END,FAIL
hostname
date
sleep 60
All of the lines that begin with a #SBATCH are directives to Slurm. The meaning of the directives in the sample script are exampled in a comment line that precedes the directive.
The full list of available directives is explained in the man page for the sbatch command which is available on discovery.
sbatch
will copy the current shell environment and the scheduler will recreate that environment on the allocated compute node when the job starts. The job script does NOT run .bashrc or .bash_profile, and so may not have the same environment as a fresh login shell. This is important if you use aliases, or the conda
system to set up your own custom version of python and sets of python packages. Since conda defines shell functions, it must be configured before you can call, e.g. conda activate my-env
The simplest way to do this is for the first line of your script to be:
#!/bin/bash -l
which explicitly starts bash as a login shell
Now submit the job and check its status:
[john@discovery7 slurm]$ sbatch my_first_slurm.sh
Submitted batch job 4056
[john@discovery7 slurm]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
4056 standard multicor john R 0:01 1 p04
[john@discovery7 slurm]$ scontrol show job 4056
JobId=4056 JobName=multicore_job
UserId=john(48374) GroupId=rc-users(480987) MCS_label=rc
Priority=4294901747 Nice=0 Account=rc QOS=normal
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=00:00:09 TimeLimit=00:15:00 TimeMin=N/A
SubmitTime=2021-05-14T12:25:53 EligibleTime=2021-05-14T12:25:53
AccrueTime=2021-05-14T12:25:53
StartTime=2021-05-14T12:25:54 EndTime=2021-05-14T12:40:54 Deadline=N/A
SuspendTime=None SecsPreSuspend=0 LastSchedEval=2021-05-14T12:25:54
Partition=standard AllocNode:Sid=discovery7:21489
ReqNodeList=(null) ExcNodeList=(null)
NodeList=p04
BatchHost=p04
NumNodes=1 NumCPUs=2 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
TRES=cpu=2,node=1,billing=2
Socks/Node=* NtasksPerN:B:S:C=1:0:*:* CoreSpec=*
MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=/dartfs-hpc/rc/home/p/d18014p/xnode_tests/slurm/my_first_slurm.sh
WorkDir=/dartfs-hpc/rc/home/p/d18014p/xnode_tests/slurm
StdErr=/dartfs-hpc/rc/home/p/d18014p/xnode_tests/slurm/slurm-4056.out
StdIn=/dev/null
StdOut=/dartfs-hpc/rc/home/p/d18014p/xnode_tests/slurm/slurm-4056.out
Power=
MailUser=<email> MailType=BEGIN,END,FAIL
NtasksPerTRES:0
JOBID is the unique ID of the job – in this case it is 4056. In the above example I am issuing scontrol to view information related to my job
The output file, slurm-4056.out, consists of three sections:
- A header section, Prologue, which gives information such as JOBID, user name and node list.
- A body section which include user output to STDOUT.
- A footer section, Epilogue, which is similar to the header.
- A useful difference is the report of wallclock time towards the end.
Typically your job will create one file and join STDOUT & STDERR. To have your job create two files for STDOUT & STDERR be sure to pass --output and --error. Here is an example:
--output=My_first_job-%x.%j.out
--error=My_First_job-%x.%j.err
File Management In a Batch Queue System
Sometimes you may be running the same program in multiple jobs and you will need to be sure to keep your input and output files separate for each job.
One way to manage your data files is to have a separate directory for each job.
- Copy the required input files to the directory and then edit the batch script file to include a line where you change to the directory that contains the input files.
cd /path/to/where/your/input/files/are
- Place this line before the line where you issue the command to be run.
- By default your job files will be created in the directory that you submit from.