The DGX cluster is a pilot project to study GPU Computing, Container Technology, Cloud Computing, etc. The DGX cluster is configured with a DGX Station as the master node and many compute nodes. It uses Kubernetes as job manager and scheduler. The DGX Station (master node of the DGX cluster) is a Nvidia DGX Station; it has 20 physical CPU cores and 4 Tesla Volta V100 GPUs with 64GB memory. There are two types of compute nodes:

  • Three Nvidia DGX-1 Servers that have 40 physical CPU cores and 8 Tesla Volta V100 GPUs with 512GB memory,
  • Two Dell PowerEdge C4140 machines with 40 physical CPU cores, 4 Tesla Volta V100 GPUs with 256GB memory

.
Get an account on the DGX Cluster.