Technology

Nvidia Unveils New CPU and Plans for Supercomputer

GBM Insights

NVIDIA unveils Grace Hopper, their highly anticipated “superchip” that combines the Grace CPU and Hopper H100 GPU in a tightly integrated solution for demanding AI workloads.
The strategic fusion of GPU technology with diverse processors allows NVIDIA to penetrate markets that benefit from GPU acceleration.
NVIDIA’s DGX GH200 AI Supercomputer emerges as a ground-breaking multi-rack computational cluster, designed for training large AI models.

From GPU’s to CPU’s

NVIDIA, the technological powerhouse, is teeing off a series of exhilarating AI-related announcements with the exciting revelation that their highly anticipated “superchip” called Grace Hopper has officially entered full production. Combining the formidable Grace CPU with the formidable Hopper H100 GPU, Grace Hopper represents NVIDIA’s resolute response to the discerning customers in search of a more tightly integrated CPU + GPU solution for their demanding workloads, especially those involving AI models.

This remarkable endeavour, which has been in the making for several years, showcases NVIDIA’s ingenious strategy to harness their pre-existing dominance in the GPU realm while simultaneously delving into the CPU space, ultimately culminating in a ground-breaking semi-integrated CPU/GPU product that outshines any offering from their leading competitors. Armed with their unrivalled expertise in GPUs, NVIDIA has taken a captivating reverse engineering approach, ingeniously amalgamating their ground-breaking GPU technology with diverse types of processors such as CPUs and DPUs. This strategic fusion allows them to penetrate markets that greatly benefit from the awe-inspiring GPU acceleration, where traditional standalone GPUs might not be the optimal solution.

Nvidia unveiling its new CPU

In this pioneering venture of merging NVIDIA’s HPC CPU with the awe-inspiring GPU, the Hopper GPU stands as the well-known component of the equation. Even though it commenced shipping in appreciable volumes just this year, NVIDIA had already unveiled the astounding Hopper architecture and set the stage for remarkable performance expectations over a year ago. Powered by a colossal 80 billion transistors, the GH100 GPU bestows upon the H100 model nearly 1 PFLOPS of FP16 matrix math throughput, perfectly tailored for handling the most demanding AI workloads. Furthermore, the H100 flaunts an impressive 80GB of cutting-edge HBM3 memory. Undoubtedly, the H100 has already achieved tremendous success, with NVIDIA struggling to meet the overwhelming demand driven by the meteoric rise of ChatGPT and other ingenious generative AI services. However, NVIDIA’s indomitable spirit propels them to forge ahead, vigorously pursuing their mission to penetrate markets where workloads demand the utmost synergy between CPU and GPU.

NVIDIA’s ground-breaking developments continue to captivate the tech world as their latest creation, the Grace CPU, joins forces with the formidable H100. Fresh off the production line a mere few months ago, the Grace CPU is powered by 72 CPU cores based on the Arm Neoverse V2 architecture, accompanied by an impressive 480GB of LPDDR5X memory. However, what truly sets Grace apart is NVIDIA’s bold decision to co-package the CPU with LPDDR5X memory, forsaking the traditional DIMM approach. This innovative move allows for both higher clock speeds and lower power consumption, albeit at the expense of expandability. Consequently, Grace revolutionizes the HPC-class CPU landscape, especially in the realm of Large Language Model (LLM) training, where massive datasets and high memory bandwidths reign supreme.

Beyond the surface, the true magic lies in the data shuffling capabilities, defining the Grace Hopper board as more than just a mere combination of CPU and GPU. NVIDIA’s inclusion of NVLink support, their proprietary high-bandwidth chip interconnect, grants Grace and Hopper an unparalleled interconnect speed, surpassing traditional CPU + GPU configurations reliant on PCIe. The result is an astonishing 900GB/second of bandwidth through the NVLink Chip-to-Chip (C2C) link, enabling Hopper to communicate with Grace at lightning speed, outpacing Grace’s own memory read and write operations.

This exceptional GH200 “superchip,” as christened by NVIDIA, emerges as their answer to the burgeoning AI and HPC markets in the upcoming product cycle. For customers seeking a more localized CPU solution, surpassing the capabilities of a typical CPU + GPU setup, Grace Hopper represents NVIDIA’s most comprehensive computing offering to date. While uncertainties persist regarding the prevalence of the Grace-only superchip, considering NVIDIA’s current AI-focused pursuits, Grace Hopper may very well become the epitome of Grace’s prowess.

Anticipated for release later this year, systems incorporating GH200 chips are poised to revolutionize the tech landscape once again, showcasing NVIDIA’s unwavering commitment to technological advancement.

The DGX GH200 AI Supercomputer is a major step up for Grace Hopper.

As Grace Hopper’s departure looms on the horizon, NVIDIA wastes no time in setting the stage for its state of the art DGX system, a marvel of technological innovation. In a departure from its predecessors, this system goes beyond the boundaries of a mere DGX; it emerges as a full-fledged multi-rack computational cluster, earning the distinguished title of a “supercomputer” in NVIDIA’s lexicon.

Imagine, if you will, the DGX GH200 AI Supercomputer, a fully integrated, turn-key marvel comprising an awe-inspiring 256-node GH200 cluster. Picture the scene: 24 imposing racks, each housing 256 GH200 chips, with their potent combination of 256 Grace CPUs and 256 H100 GPUs. Not to mention the intricate web of networking hardware seamlessly linking these behemoth systems. Collectively, the DGX GH200 cluster boasts a staggering 120TB of CPU-attached memory, an additional 24TB of GPU-attached memory, and an astounding 1 EFLOPS of FP8 throughput, complete with sparsity.

The nodes within this computational behemoth are seamlessly interconnected via a two-layer networking system cantered around the power of NVLink. Immediate communication between GH200 blades is facilitated by 96 local, L1 switches, while a secondary layer of connectivity is woven together by 36 L2 switches, tightly binding the L1 switches. And if the impressive scale of this system fails to satiate your appetite for scalability, fear not. NVIDIA has endowed the DGX GH200 cluster with InfiniBand capability, courtesy of their inclusion of ConnectX-7 network adapters.

Designed with the purpose of training large AI models in mind, this colossal silicon cluster targets a specific market. NVIDIA leverages their existing hardware expertise and toolsets, taking full advantage of the generous memory capacity and bandwidth provided by the 256-node cluster. The surge in demand for expansive language models has revealed the limitations posed by memory capacity, and NVIDIA aims to present a comprehensive, single-vendor solution for customers harbouring especially colossal models.

While NVIDIA remains discreet regarding specifics, subtle hints suggest that the DGX GH200 cluster pulls out all the stops. The listed memory capacities hint at NVIDIA’s inclusion of their highly sought-after 96GB H100 GPU models, equipped with the normally disabled 6th stack of HBM3 memory. These exclusive variants, currently available only in select products like the specialty H100 NVL PCIe card, will now find their home within certain GH200 configurations, securing NVIDIA’s position as a purveyor of elite silicon.

Naturally, this unparalleled offering from NVIDIA comes with a price tag befitting its stature. Although no official pricing has been disclosed, one can infer from the cost of the HGX H100 board (featuring 8x H100s on a carrier board for $200K) that a single DGX GH200 will command a price in the low eight-digit range. Undoubtedly, the DGX GH200 sets its sights on a select subset of enterprise clientele – those who require extensive large-scale model training and possess the financial prowess to invest in a complete, turn-key solution.

Yet, the DGX GH200 aims for more than just being a top-tier system offered to well-heeled customers. It serves as the blueprint for empowering hyperscalers to construct their own GH200-based clusters, blazing a trail of technological advancement. NVIDIA understands that demonstrating the system’s capabilities and flawless operation is best achieved by leading the way. While NVIDIA undoubtedly relishes the prospect of selling numerous DGX systems directly, their true triumph lies in widespread adoption of GH200 by hyperscalers, CSPs, and other industry players, ensuring victory for NVIDIA, regardless of the competition.

In the meantime, while the DGX GH200 AI Supercomputer remains a luxury reserved for a privileged few, NVIDIA assures us that these remarkable systems will be available by the end of the year, tantalizingly close for the select group of businesses who can afford to explore the frontiers of AI with unparalleled computational might.

Global Brands Magazine

Nvidia Unveils New CPU and Plans for Supercomputer

Technology

Nvidia Unveils New CPU and Plans for Supercomputer

GBM Insights

From GPU’s to CPU’s

The DGX GH200 AI Supercomputer is a major step up for Grace Hopper.

Leave a Reply

Text Translator

Awards Ceremony

Click on the Image to view the Magazine