Making Softmax More Efficient with NVIDIA Blackwell Ultra | NVIDIA Technical Blog
… Alleviating the softmax bottleneck in Blackwell Ultra By doubling the throughput of the SFU for exponentials in the Blackwell Ultra architecture, NVIDIA is alleviating this bottleneck and is allowing for a more balanced and efficient processing pipeline. …