Long running jobs (rcdetach)

Keeping DartFS credentials alive

When you login to a Research Computing system (e.g. Discovery, Polaris, Andes) you get a 10 hour credential to access your DartFS storage.  That is your home directory but also potentially data volumes for research labs. This credential can be extended for up to 30 days but by default it is deleted when you logout.

So what do you do when you want to run a job that persists after you logout or that is expected to last for more than 10 hours?

Traditionally, you would run the job with a command called nohup (it has it's own man page) and then push the process to the background.  With DartFS storage you also need a way to keep your credential alive.  The "krenew" command can do that (also has it's own man page).  But this is starting to get complicated.

That is where the "rcdetach" script comes in.   rcdetach is simply a wrapper around nohup and kinit.  It prompts you for your NetID and password so it can get a fresh credential and then hides all the dirty details for you.  It's really easy to use.  Here is an example of running the sleep program for 600 seconds.

$ rcdetach sleep 600

So you can run any RScriptpython, or any other terminal based software in interactive mode for up to 30 days:

$ rcdetach RScript my_very_long_analysis.R

or

$ rcdetach python my_very_long_analysis.py

You will be required to enter the password associated with your NetID. The standart output (ie. what you see on the terminal screen) will be redirected into a .out file so that you can safely log off without missing any of the output produced to the terminal. For example:

$ rcdetach sleep 600

Enter your NetID: dXXXXXe

Password for dXXXXXe@KIEWIT.DARTMOUTH.EDU:

 

krenew pid: 39513

sleep pid:  39532

output: /dartfs-hpc/rc/home/e/dXXXXXe/39492nohup.out

In this example, the entire output will be saved in /dartfs-hpc/rc/home/e/dXXXXXe/39492nohup.out

Details

Article ID: 76691
Created
Tue 4/23/19 2:45 PM
Modified
Thu 8/19/21 11:41 AM