NVIDIA Platform Delivers Lowest Token Cost Enabled by Extreme Co-Design | NVIDIA Technical Blog
… Scale-out inference with NVIDIA Quantum-X800 InfiniBand platform enables millions of tokens per second NVIDIA also set new throughput records at scale on the DeepSeek-R1 model in the offline and server scenarios by submitting results using four GB300 NVL72 systems interconnected with NVIDIA Quantum… …