R#

Yucheng will have new documentation to replace this one

#

HPC Tools R package repos:

For R/4.0.0, use /cluster/tufts/hpc/tools/R/4.0.0

For R/4.1.1, use /cluster/tufts/hpc/tools/R/4.1

For R/4.2.0, use /cluster/tufts/hpc/tools/R/4.2.0

For R/4.3.0, use /cluster/tufts/hpc/tools/R/4.3.0

R Interactive Session#

An R interactive session refers to the mode of using the R programming language interactively, where users can directly enter commands, execute them, and immediately see the results. It allows users to explore data, test functions, and perform analyses in real-time within the R environment. This mode is particularly useful for tasks like data exploration, debugging code, and quick data manipulations.

Steps:

Login to the HPC cluster (new cluster Pax)

ssh your_username@login.pax.tufts.edu
From the login node, load R module (e.g. R/4.0.0)

$ module load R/4.0.0
Allocate computing resources. Start an interactive session with your desired number of cores and memory, here we are using 2 cores with 4GB of memory

$ srun -p interactive -n 2 --mem=4g --pty bash

Note: Interactive partition has a default 4-hour time limit

For more information on how to allocate resources on Tufts HPC cluster, please reference: Pax User Guide
Within the interactive session, you can start R

$ R

In R, you can install the packages you need in your home directory with:

> install.packages("XXX").

You can also use the packages installed in HPC Tools R package repo:

> LIB='/cluster/tufts/hpc/tools/R/4.0.0'

>.libPaths(c("",LIB))

If you are having trouble installing the packages you need, please contact tts-research@tufts.edu.

Exit from R command line interface:

> q()
To terminate interactive session

$ exit

R batch jobs#

R batch job refers to running R scripts or R commands in a batch mode, where the job is submitted to the computing cluster’s scheduler to be executed asynchronously. Batch jobs are typically used for computationally intensive tasks or tasks that require significant processing time, as they allow users to submit jobs and continue working without having to wait for the job to complete. This approach is useful for running R scripts that involve large datasets, complex calculations, or simulations that may take a long time to finish.

Steps:

Login to the HPC cluster
Upload your R script to the HPC cluster
Go to the directory/folder which contains your R script

Open your favorite text editor and write a slurm submission script similar to the following one batchjob.sh (name your own)

#!/bin/bash
#SBATCH -J myRjob  #job name
#SBATCH --time=00-00:20:00 #requested time
#SBATCH -p batch  #running on "batch" partition/queue
#SBATCH -n 2  #2 cores total
#SBATCH --mem=2g #requesting 2GB of RAM total
#SBATCH --output=myRjob.%j.out #saving standard output to file
#SBATCH --error=myRjob.%j.err  #saving standard error to file
#SBATCH --mail-type=ALL  #email optitions
#SBATCH --mail-user=Your_Tufts_Email @tufts.edu
module load R/4.0.0
Rscript --no-save your_rscript_name.R

You can find a sample script that you can copy to your own directory in /cluster/tufts/hpc/tools/slurm_scripts/R

Submit it with

$ sbatch batchjob.sh
If you are submitting multiple batch jobs to run the same script on different datasets, please make sure they are saving results to different files inside of your R script.