The Blackwells are superior on paper, but there's some "Nvidia Math" involved: When they report performance in press announcements, they don't usually mention the precision. Yes, the Blackwells are more than double the speed of the Hopper H100's, but thats comparing FP8 to FP4 (the H100's can't do native FP4). Yes, thats great for certain workloads, but not the majority.
What's more interesting is the VRAM speed. The 6000 Pro has 96 GB of GPU memory and 1.8 TB/s bandwidth, the H100 haas the same amount, but with HBM3 at 4.9 TB/s. That 2.5X increase is very influential in the overall performance of the system.
Lastly, if it works, the NVLink-C2C does 900 GB/s of bandwidth between the cards, so about 5x what a pair of 6000 Pros could do over PCIE5. Big LLMs need well over the 96 GB on a single card, so this becomes the bottleneck.
e.g. Here are benchmarks on the RTX 6000 pro using the GPT-OSS-120B model, where it generates 145 tokens/sec, and I get 195 tokens/sec on the GH200. https://www.reddit.com/r/LocalLLaMA/comments/1mm7azs/openai_...
The NVLink is definitely a strong point, I missed that detail. For LLM inference specifically it matters fairly little iirc, but for training it might.
GPUs have such a short liefspan these days that it is really important to compare new vs. used.
I had 4x 4090, that I had bought for about $2200 each in early 2023. I sold 3 of them to help pay for the GH200, and got 2K each.