NVIDIA Nemotron 3 Ultra Powers Faster, More Efficient Reasoning for Long-Running Agents | NVIDIA Technical Blog
…Architectural innovations include hybrid Mamba-Transformer layers for efficient long-context handling, NVFP4 quantization for cross-architecture GPU deployment with up to 5x higher throughput, LatentMoE for expert routing, and multi-token…