Using RStudio on the Discovery Cluster

Preferred method

Then the development to test workflow is

  • Edit in RStudio and Cmd+S to save

Cmd+Tab to switch to terminal, up arrow+Enter to submit a new job

Interactive use on the cluster


Some kinds of data processing applications really benefit from interactive use of an app like RStudio or Jupyter Notebook.
This needs to be done responsibly, making sure you don't leave resources reserved and idle when you are done.

If needed, here is how you can run an interactive RStudio session on the Discovery HPC Cluster.
 

STEP 1:


 Connect to discovery using your username and password
 Either connect directly to desired node if your department has purchased access to one, or be assigned one using "srun --pty bash"

STEP2:


 Navigate to directory you would like to work in, and place the ./HPC_execution.slurm script in it

The .HPC_execution script, courtesey Owen Wilkins:

#!/bin/bash

 

#SBATCH --partition=preempt1

#SBATCH --account=>your slurm account> # job queue

#SBATCH --job-name=int_r # Assign an short name to your job

#SBATCH --nodes=1 # Number of nodes you require

#SBATCH --ntasks=1 # Total # of tasks across all nodes

#SBATCH --cpus-per-task=10 # Cores per task (>1 if multithread tasks)

#SBATCH --time=3:00:00 # Total run time limit (HH:MM:SS)

#SBATCH --output=slurm.%N.%j.out # STDOUT output file

#SBATCH --error=slurm.%N.%j.err # STDERR output file (optional)

 Execute ./HPC_execution.slurm 
 

# Create temporary directory to be populated with directories to bind-mount in the container

# where writable file systems are necessary. Adjust path as appropriate for your computing environment.

workdir=$(python -c 'import tempfile; print(tempfile.mkdtemp())')

 

mkdir -p -m 700 ${workdir}/run ${workdir}/tmp ${workdir}/var/lib/rstudio-server

cat > ${workdir}/database.conf <<END

provider=sqlite

directory=/var/lib/rstudio-server

END

 

# Set OMP_NUM_THREADS to prevent OpenBLAS (and any other OpenMP-enhanced

# libraries used by R) from spawning more threads than the number of processors

# allocated to the job.

#

# Set R_LIBS_USER to a path specific to rocker/rstudio to avoid conflicts with

# personal libraries from any R installation in the host environment

 

cat > ${workdir}/rsession.sh <<END

#!/bin/sh

export OMP_NUM_THREADS=${SLURM_JOB_CPUS_PER_NODE}

exec /usr/lib/rstudio-server/bin/rsession "\${@}"

END

 

chmod +x ${workdir}/rsession.sh

 

export SINGULARITY_BIND="${workdir}/run:/run,${workdir}/tmp:/tmp,${workdir}/database.conf:/etc/rstudio/database.conf,${workdir}/rsession.sh:/etc/rstudio/rsession.sh,${workdir}/var/lib/rstudio-server:/var/lib/rstudio-server"

You should be able to get this to work with the singularity container that is referenced in the singularity exec command (“docker://alemenze/abrfseurat”).

The first time you run this, it should download and cache the container, but next time you run it, you should just be able to load it.

This image contains a bunch of packages for doing single cell RNA-seq data analysis in R, so you might want to replace the container with one of your own singularity  containers in the singularity exec command.

# Do not suspend idle sessions.

# Alternative to setting session-timeout-minutes=0 in /etc/rstudio/rsession.conf

https://github.com/rstudio/rstudio/blob/v1.4.1106/src/cpp/server/ServerSessionManager.cpp#L126

export SINGULARITYENV_RSTUDIO_SESSION_TIMEOUT=0

 

export SINGULARITYENV_USER=$(id -un)

export SINGULARITYENV_PASSWORD=$(echo $RANDOM | base64 | head -c 20)

# get unused socket per https://unix.stackexchange.com/a/132524

# tiny race condition between the python & singularity commands

readonly PORT=$(python -c 'import socket; s=socket.socket(); s.bind(("", 0)); print(s.getsockname()[1]); s.close()')

cat 1>&2 <<END

 

1. SSH tunnel from your workstation using the following command:

 

ssh -N -L 8787:${HOSTNAME}:${PORT} ${SINGULARITYENV_USER}@discovery7.dartmouth.edu

 

and point your web browser to http://localhost:8787

 

2. log in to RStudio Server using the following credentials:

 

user: ${SINGULARITYENV_USER}

password: ${SINGULARITYENV_PASSWORD}

 

When done using RStudio Server, terminate the job by:

 

1. Exit the RStudio Session ("power" button in the top right corner of the RStudio window)

2. Issue the following command on the login node:

 

scancel -f ${SLURM_JOB_ID}

END

Then run: 

singularity exec --cleanenv -H $PWD:/home/rstudio docker://alemenze/abrfseurat \

/usr/lib/rstudio-server/bin/rserver --server-user ${USER} --www-port ${PORT} \

--auth-none=0 \

--auth-pam-helper-path=pam-helper \

--auth-stay-signed-in-days=30 \

--auth-timeout-minutes=0 \

--rsession-path=/etc/rstudio/rsession.sh

printf 'rserver exited' 1>&2

 

(If you have a problem, please contact research.computing@dartmouth.edu, not Owen)

STEP 3:


- Run this command in a local terminal window:
ssh -t netID@discovery7.dartmouth.edu -L PORT:localhost:PORT ssh clusterNode -L PORT:localhost:SOCKET
- Replace netID with your own
- Replace PORT with a random 4 digit number
- Replace SOCKET with relevant value output by ./HPC_execution.slurm
- Replace clusterNode with relevant value output by ./HPC_execution.slurm

STEP 4:


- In your web-browser, navigate to http://localhost:PORT
- Replace PORT as done above

STEP 5:


In the resulting RStudio server window, enter username and password output by ./HPC_execution.slurm
RStudio should open in your browser, and all child directories will be available

 STEP 6: ESSENTIAL!

 When you are done with the interactive session, you must run scancel on your job to ensure that the resources are released for others to use, and shut down the tunnels you created. We don't want resources to be reserved and idle.

 

Details

Article ID: 155831
Created
Mon 12/4/23 1:06 PM
Modified
Wed 2/21/24 12:52 PM