How NVIDIA Extreme Hardware-Software Co-Design Delivered a Large Inference Boost for Sarvam AI’s Sovereign Models | NVIDIA Technical Blog
… How disaggregated serving removes the critical path and boosts throughput 1.5x Despite kernel and scheduling improvements, our profiling indicated that inter-GPU communication for token distribution expert parallelism remained on the critical path. …