Introducing Nemotron 3 Super: An Open Hybrid Mamba-Transformer MoE for Agentic Reasoning | NVIDIA Technical Blog
…More experts, same cost. By compressing tokens before they reach the experts, latent MoE enables the model to consult 4x as many experts for the exact same computational cost as running one…
