NVIDIA’s H100 GPUs & The AI Frenzy; a Rundown of Current Situation

NVIDIA's H100 GPUs & The AI Frenzy; a Rundown of Current Situation 1

We all are well aware of NVIDIA and the AI "gold mine" that has recently taken everyone by storm. In the midst of everything stands Team Green's H100 AI GPUs, which are simply the most sought-after piece of hardware for AI at the moment with everyone trying to get hands on one to power their AI needs.

NVIDIA H100 GPU Is The Best Chip For AI At The Moment & Everyone Wants More of Those

This article isn't particularly news but highlights readers on the current situation of the AI industry, and how companies are revolving around the H100 GPUs for their "future".

Before we go into the crux of the article, giving a recap becomes a necessity. So, at the start of 2022, everything was going fine with the usual developments. However, with November's arrival, a revolutionary application emerged named "ChatGPT", which established the foundations of the AI hype. While we cannot categorize "ChatGPT" as the founder of the AI boom, we certainly can say that it acted like a catalyst. With it emerged competitors like Microsoft and Google, getting forced into an AI race to release generative AI applications.

You might say, where does NVIDIA come in here? The backbone of generative AI involves hefty LLM (Large Language Model) training periods, and the NVIDIA AI GPUs come in clutch here. We won't go into tech specs and factual bits since that makes things dull and no fun to read. However, if are into getting to know specifics, we are dropping a table below, highlighting every AI GPU release from NVIDIA, dating back to Tesla models.

NVIDIA HPC / AI GPUs

NVIDIA Tesla Graphics Card NVIDIA H100 (SMX5) NVIDIA H100 (PCIe) NVIDIA A100 (SXM4) NVIDIA A100 (PCIe4) Tesla V100S (PCIe) Tesla V100 (SXM2) Tesla P100 (SXM2) Tesla P100
(PCI-Express)
Tesla M40
(PCI-Express)
Tesla K40
(PCI-Express)
GPU GH100 (Hopper) GH100 (Hopper) GA100 (Ampere) GA100 (Ampere) GV100 (Volta) GV100 (Volta) GP100 (Pascal) GP100 (Pascal) GM200 (Maxwell) GK110 (Kepler)
Process Node 4nm 4nm 7nm 7nm 12nm 12nm 16nm 16nm 28nm 28nm
Transistors 80 Billion 80 Billion 54.2 Billion 54.2 Billion 21.1 Billion 21.1 Billion 15.3 Billion 15.3 Billion 8 Billion 7.1 Billion
GPU Die Size 814mm2 814mm2 826mm2 826mm2 815mm2 815mm2 610 mm2 610 mm2 601 mm2 551 mm2
SMs 132 114 108 108 80 80 56 56 24 15
TPCs 66 57 54 54 40 40 28 28 24 15
FP32 CUDA Cores Per SM 128 128 64 64 64 64 64 64 128 192
FP64 CUDA Cores / SM 128 128 32 32 32 32 32 32 4 64
FP32 CUDA Cores 16896 14592 6912 6912 5120 5120 3584 3584 3072 2880
FP64 CUDA Cores 16896 14592 3456 3456 2560 2560 1792 1792 96 960
Tensor Cores 528 456 432 432 640 640 N/A N/A N/A N/A
Texture Units 528 456 432 432 320 320 224 224 192 240
Boost Clock TBD TBD 1410 MHz 1410 MHz 1601 MHz 1530 MHz 1480 MHz 1329MHz 1114 MHz 875 MHz
TOPs (DNN/AI) 3958 TOPs 3200 TOPs 1248 TOPs
2496 TOPs with Sparsity
1248 TOPs
2496 TOPs with Sparsity
130 TOPs 125 TOPs N/A N/A N/A N/A
FP16 Compute 1979 TFLOPs 1600 TFLOPs 312 TFLOPs
624 TFLOPs with Sparsity
312 TFLOPs
624 TFLOPs with Sparsity
32.8 TFLOPs 30.4 TFLOPs 21.2 TFLOPs 18.7 TFLOPs N/A N/A
FP32 Compute 67 TFLOPs 800 TFLOPs 156 TFLOPs
(19.5 TFLOPs standard)
156 TFLOPs
(19.5 TFLOPs standard)
16.4 TFLOPs 15.7 TFLOPs 10.6 TFLOPs 10.0 TFLOPs 6.8 TFLOPs 5.04 TFLOPs
FP64 Compute 34 TFLOPs 48 TFLOPs 19.5 TFLOPs
(9.7 TFLOPs standard)
19.5 TFLOPs
(9.7 TFLOPs standard)
8.2 TFLOPs 7.80 TFLOPs 5.30 TFLOPs 4.7 TFLOPs 0.2 TFLOPs 1.68 TFLOPs
Memory Interface 5120-bit HBM3 5120-bit HBM2e 6144-bit HBM2e 6144-bit HBM2e 4096-bit HBM2 4096-bit HBM2 4096-bit HBM2 4096-bit HBM2 384-bit GDDR5 384-bit GDDR5
Memory Size Up To 80 GB HBM3 @ 3.0 Gbps Up To 80 GB HBM2e @ 2.0 Gbps Up To 40 GB HBM2 @ 1.6 TB/s
Up To 80 GB HBM2 @ 1.6 TB/s
Up To 40 GB HBM2 @ 1.6 TB/s
Up To 80 GB HBM2 @ 2.0 TB/s
16 GB HBM2 @ 1134 GB/s 16 GB HBM2 @ 900 GB/s 16 GB HBM2 @ 732 GB/s 16 GB HBM2 @ 732 GB/s
12 GB HBM2 @ 549 GB/s
24 GB GDDR5 @ 288 GB/s 12 GB GDDR5 @ 288 GB/s
L2 Cache Size 51200 KB 51200 KB 40960 KB 40960 KB 6144 KB 6144 KB 4096 KB 4096 KB 3072 KB 1536 KB
TDP 700W 350W 400W 250W 250W 300W 300W 250W 250W 235W

The question still isn't answered here, why the H100s? Well, we are getting there. NVIDIA's H100 is the company's highest-end offering, providing immense computing capabilities. One might argue that the bump in performance brings in higher costing, but companies tend to order huge volumes, and "performance per watt" is the priority here. Compared to the A100, the Hopper "H100" brings in 3.5 times more 16-bit inference and 2.3 times 16-bit training performance, making it the obvious choice.

screen-shot-2022-03-21-at-11-08-18-am
perf-main-final-625x264

So now, we hope the superiority of the H100 GPU is evident here. Now, moving on to our next segment, why is there a shortage? The answer to this involves several aspects, the first being the vast volumes of H100s needed to train a single model. An astonishing fact is that OpenAI's GPT-4 AI model required around 10,000 to 25,000 A100 GPUs (at that time, H100s weren't released).

Modern AI startups such as Inflection AI and CoreWeave have acquired humongous amounts to H100s, with a total worth accounting in billions of dollars. This shows that a single company requires huge volumes, even to train a basic-to-decent AI model, due to which the demand has been tremendous.

NVIDIA's H100 GPUs & The AI Frenzy; a Rundown of Current Situation 2

If you question NVIDIA's approach, one can say, "NVIDIA could increase production to cope with demand." Saying this is much easier than actually implementing it. Unlike gaming GPUs, NVIDIA AI GPUs require extensive processes, with most of the manufacturing assigned to the Taiwanese semiconductor behemoth TSMC. TSMC is the exclusive supplier of NVIDIA's AI GPU, leading all stages from wafer acquisition to advanced packaging.

H100 GPUs are based on TSMC's 4N process, a revamped version of the 5nm family. NVIDIA is the biggest customer for this process since Apple previously utilized it for its A15 bionic chipset, but A16 Bionic has replaced that. Of all of the relevant steps, the production of HBM memory is the most complicated since it involves sophisticated equipment currently utilized by a few manufacturers.

CoWos Packaging Utilized in NVIDIA's H100

HBM suppliers include SK Hynix, Micron, and Samsung while TSMC has limited its suppliers, and we are unaware of who they are. However, apart from HBM, TSMC also faces problems maintaining CoWoS (Chip-on-Wafer-on-Substrate) capacity, a 2.5D packaging process, and a crucial stage in developing H100s. TSMC can't match the demand from NVIDIA, due to which order backlogs have reached new heights, getting delayed to December.

So when people use the word GPU shortage, they're talking about a shortage of, or a backlog of, some component on the board, not the GPU itself. It's just limited worldwide manufacturing of these things... but we forecast what people want and what the world can build.

-Charlie Doyle, NVIDIA's DGX VP and GM (via Computerbase.de)

We have left out many specifics, but going into detail will deviate from our primary aim, which is to detail an average user about the situation. While for now, we don't believe the shortage could reduce and, in turn, is expected to increase. However, we could see a landscape shift here after AMD's decision to consolidate its position in the AI market.

DigiTimes reports that "TSMC seems to be particularly optimistic about demand for AMD’s upcoming Instinct MI300 series, saying it will be half of Nvidia's total output of CoWoS-packaged chips" It may distribute the workload across companies. Still, judging by Team Green's greedy policies in the past, something like this would require a severe offering from AMD.

Summing up our talk, NVIDIA's H100 GPUs are leading the AI hype to new heights, which is why this frenzy surrounds them. We aimed to wrap up our talk by giving readers a general idea of the whole scenario. Credits to GPU Utilis for the idea behind this article; make sure to look at their report too.

Written by Muhammad Zuhair


Refference- https://wccftech.com

Post a Comment

0 Comments