GPU Resources#
GPUs (Graphics Processing Units) are essential for tasks requiring high parallelism, such as deep learning and simulations. In the context of HPC these devices no longer perform any graphics function, but the “graphics” name has stuck vs the more apt accelerator term. To lean more about accelerated computing visit https://blogs.nvidia.com/blog/what-is-accelerated-computing/ .
NVIDIA GPUs are available in gpu
and preempt
public partitions as well as some labs private partitions.
When scheduling batch or interactive jobs that need GPU you add the --gres
option to your command.
This allows you to select the type and quantity of GPUs needed.
The simplest request is
--gres=gpu:1
, e.g.$ srun -p preempt -n 2 --mem=4g --gres=gpu:1 -t 1:00:00 --pty bash
--gres
: Generic Resourcegpu:1
: requesting one GPU (any GPU architecture available in the request partition)
To request a specific GPU architecture you can add it to the gres
--gres=gpu:t4:1
, e.g.$ srun -p preempt -n 2 --mem=4g --gres=gpu:t4:1 -t 1:00:00 --pty bash
--gres
: Generic Resourcegpu:t4:1
: requesting one T4 GPU.
You can request more than one type (but not all) of GPUs with
constraint
, e.g.$ srun -p preempt -n 2 --mem=4g --gres=gpu:1 --constraint="t4|p100|v100" -t 1:00:00 --pty bash
--constraint
: set constraints on the resources allocated for the task.t4|p100|v100
: indicates that the task can use a GPU of type t4, p100, or v100, where the | symbol means “or”.
Warning
DO NOT manually set
CUDA_VISIBLE_DEVICES
.Users can only see GPU devices that are assigned to them by using the
$ nvidia-smi
command.When submiting batch jobs, it’s recommended adding this command
$ nvidia-smi
in your slurm job submission script to include the output in the log for troubleshooting purposes.
Available GPUs:#
GPU Model |
Memory |
Max per node |
Partition Available in |
Constraints |
Notes |
---|---|---|---|---|---|
a100 |
40G or 80G |
8 |
gpu, preempt |
a100 |
|
p100 |
16GB |
6 |
gpu, preempt |
p100 |
|
v100 |
16GB |
4 |
preempt |
v100 |
|
t4 |
16GB |
4 |
preempt |
t4 |
Max CUDA version is 10.2 |
rtx_6000 |
24GB |
8 |
preempt |
rtx_6000 |
|
rtx_a6000 |
48GB |
8 |
preempt |
rtx_a6000 |
|
rtx_6000ada |
48GB |
4 |
preempt |
rtx_6000ada |
|
l40s |
48GB |
4 |
preempt |
l40s |
|
rtx_a5000 |
24GB |
8 |
preempt |
rtx_a5000 |
|
l40 |
48GB |
4 |
preempt |
l40 |
|
h100 |
80GB |
3 |
preempt |
h100 |
All GPU cards drivers except t4 support CUDA 12.2