news

NVIDIA Blackwell is now operational in data centers: NVLINK upgraded to 1.4TB/s, first FP4 GenAI image released

2024-08-24

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

IT Home reported on August 24 that NVIDIA invited some media to hold a briefing and demonstrated the Blackwell platform to technology reporters for the first time. NVIDIA will attend the Hot Chips 2024 event held from August 25 to 27 to demonstrate the use of the Blackwell platform in data centers.

Denies Blackwell's IPO delay

At this briefing, NVIDIA refuted the news that Blackwell would be delayed in launching and shared more information about the data center Goliath.

Nvidia demonstrated Blackwell's operation in one of its data centers at a briefing, and emphasized that Blackwell is progressing as planned and will be shipped to customers later this year.

The claim that Blackwell has some kind of defect or problem that prevents it from being released to the market this year is untenable.

About Blackwell

Nvidia said Blackwell is not just a chip, it is a platform. Like Hopper, Blackwell includes a large number of designs for data center, cloud computing and artificial intelligence customers, and each Blackwell product is composed of different chips.

IT Home attaches the following chips:

Blackwell GPU

Grace CPU

NVLINK Switch Chip

Bluefield-3

ConnectX-7

ConnectX-8

Spectrum-4

Quantum-3

Blackwell Cable Tray

NVIDIA also shared new images of the various bridges in the Blackwell product line. These are the first images of the Blackwell bridges that have been shared, showcasing the extensive engineering expertise required to design the next generation of data center platforms.

Target trillion-parameter AI model

Blackwell is designed to meet the needs of modern AI and provides excellent performance for large language models such as Meta’s 405B Llama-3.1. As LLMs grow larger and have more parameters, data centers will need more compute and lower latency.

Multi-GPU Inference Method

The multi-GPU inference approach is to perform computations on multiple GPUs to achieve low latency and high throughput, but taking the multi-GPU route also has its complexities. Each GPU in a multi-GPU environment must send computation results to other GPUs at each layer, which requires high-bandwidth GPU-to-GPU communication.

The multi-GPU inference approach is to perform computations on multiple GPUs to achieve low latency and high throughput, but taking the multi-GPU route also has its complexities. Each GPU in a multi-GPU environment must send computation results to other GPUs at each layer, which requires high-bandwidth GPU-to-GPU communication.

Faster NVLINK switches

With Blackwell, NVIDIA introduced a faster NVLINK switch that doubles the fabric bandwidth to 1.8 TB/s. The NVLINK switch itself is an 800mm2 chip based on TSMC’s 4NP node, which can scale NVLINK to 72 GPUs in a GB200 NVL72 rack.

The chip provides 7.2 TB/s of full-to-full bidirectional bandwidth through 72 ports, with 3.6 TFLOPs of in-network computing power. The NVLINK switch tray comes with two such switches, providing up to 14.4 TB/s of total bandwidth.

Water Cooling

NVIDIA is using water cooling to improve performance and efficiency. The GB200, Grace Blackwell GB200 and B200 systems will use these new liquid cooling solutions, which can reduce the power costs of data center facilities by up to 28%.

First AI image generated using FP4 computation

NVIDIA also shared the world’s first AI image generated using FP4 computing. The image shows that the 4-bit bunny image generated by the FP4 quantized model is very similar to the FP16 model, but faster.

This image was produced by MLPerf using Blackwell in stable diffusion. Now, the challenge with reducing precision (from FP16 to FP4) is that you lose some precision.