Introducing Nvidia’s DGX SuperPOD: A Powerful Solution for Trillion-Parameter AI Models

Nvidia, the renowned technology company known for its GPUs, has unveiled its most powerful system yet: the DGX SuperPod. This new system is part of Nvidia’s extensive hardware and software rollout at the Nvidia GTC conference. The DGX SuperPod is powered by Nvidia’s next-generation GPUs for AI acceleration, called Blackwell, which are designed to support AI models with a trillion parameters.

The DGX SuperPod is not just a single rack server; it is a combination of multiple DGX GB200 systems. Each DGX GB200 system features 36 Nvidia GB200 Superchips, which include 36 Nvidia Grace CPUs and 72 Nvidia Blackwell GPUs. These systems are connected as a single supercomputer via fifth-generation Nvidia NVLink. The SuperPOD can be configured with eight or more DGX GB200 systems and can scale to tens of thousands of GB200 Superchips connected via Nvidia Quantum InfiniBand. With 240 terabytes of memory and 11.5 exaflops of AI supercomputing power, the DGX SuperPod is capable of large language model training and generative AI inference at a massive scale.

One of the key features that make the DGX SuperPod unique is its advanced networking and data processing units. The system utilizes the newly announced Nvidia Quantum-X800 InfiniBand networking technology, providing up to 1,800 gigabytes per second of bandwidth to each GPU in the platform. It also integrates Nvidia BlueField-3 DPUs and the fifth generation of Nvidia NVLink interconnect. Additionally, the new SuperPod includes fourth-generation Nvidia Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) technology, delivering 14.4 teraflops of in-network computing.

The GB200-based DGX systems will also be available on the Nvidia DGX cloud service, with availability on Amazon Web Services (AWS), Google Cloud, and Oracle Cloud. Nvidia’s DGX Cloud is a cloud service designed to provide the best Nvidia technology for AI research and development, both for Nvidia’s own use and for customers. The GB200 will also contribute to the advancement of Project Ceiba, a supercomputer being developed in collaboration with AWS. Project Ceiba will now support 20,000 GPUs and deliver over 400 exaflops of AI power.

Overall, the introduction of Nvidia’s DGX SuperPod represents a significant milestone in the world of AI computing. With its unparalleled power and scalability, the DGX SuperPod opens up new possibilities for large-scale AI model training and inference. The integration of advanced networking and data processing units further enhances its capabilities, making it a world-class supercomputing platform.