Discovery Cluster details

Discovery is a Linux cluster that in aggregate contains 128 nodes, 6712 CPU cores, 54.7TB of memory, and more than 2.8 PB of disk space.

Node Hardware Breakdown

Cell Vendor CPU Cores Ram GPU Scratch Nodes
a Dell AMD EPYC 75F3 (2.95GHz) 64 1TB Ampere A100 5.9TB a01-a05
p Dell Intel Xeon Gold 6248 (2.50GHz) 40 565GB Tesla V100 1.5TB p01-p04
q HPE AMD EPYC 7532 (2.4GHz) 64 512GB None 820GB q01-q10
n HPE Intel Xeon Gold 6148 (2.40GHz) 40 384GB None 820GB n01-n13
r EXXACT AMD EPYC 7543 (2.80GHz) 64 512GB None 290GB r01-r21
s Dell AMD EPYC 7543 (2.80GHz) 64 512GB None 718GB s01-s44
t Lenovo ThinkSystem SR645 V3 64 768GB None 719GB  t01-t10
centurion EXXACT AMD EPYC 7453  (2.7GHz) 56 506GB A5500 7TB centurion01-centurion09
amp EXXACT Intel Xeon Gold 6258R (2.70GHz) 56 506GB A5000 7TB amp01-amp06
Discovery offers researchers the ability to have specialized heads nodes available inside the cluster for dedicated compute. These nodes can come equipped with up to 64 compute cores and 1.5TB of memory 

Operating System :

  • RHEL 8 is used on Discovery, its supporting head-nodes and compute nodes.

 GPU compute nodes are available to the free members of discovery using the gpuqqueue. Including other specialized GPU partitions such as:

  • gpuq – High-end compute nodes with A100 GPUs in MIG with 40GB slices (free members)

  • a100 – High-end compute nodes with A100 GPUs -- Paid tier

  • v100 – High-end compute nodes with v100 GPUs -- Paid tier

  • v100_preemptable – High-end compute nodes with v100 GPUs (preemptable)

  • a5000 – Mid-range GPU nodes optimized for general GPU workloads (preemptable)

  • a5500 – Mid-range GPU nodes optimized for general GPU workloads (preemptable)

Partitions marked (preemptable) run the risk of preemption. Use with caution and always make sure you are check pointing if possible!
 
Interactive nodes, named andes | polaris, are available for testing and debugging your programs interactively before submitting them to the main cluster through the scheduler.

Node Interconnects

    • All of the compute nodes are connected via 10GB Ethernet. The cluster itself is connected to Dartmouth’s Science DMZ; facilitating faster data transfer and stronger security