NVIDIA Technical Blog
…expanding their context windows, with recent models supporting sequences of 128K tokens, 256K tokens, and beyond.... 9 MIN READ Feb 02, 2026 Optimizing Communication for Mixture-of-Experts Training with Hybrid Expert…
In addition to Muon, NVIDIA also supports many other optimizers for the research community to explore, including: The ultimate form of orthogonalized optimizer MOP (Momentum Orthogonalized by Polar decomposition) An advanced SOAP variant that updates eigen basis per step with eigen decomposition plus KL correction in REKLS
Advancing Emerging Optimizers for Accelerated LLM Training with NVIDIA Megatron | NVIDIA Technical Blog…expanding their context windows, with recent models supporting sequences of 128K tokens, 256K tokens, and beyond.... 9 MIN READ Feb 02, 2026 Optimizing Communication for Mixture-of-Experts Training with Hybrid Expert…
…expanding their context windows, with recent models supporting sequences of 128K tokens, 256K tokens, and beyond.... 9 MIN READ Feb 02, 2026 Optimizing Communication for Mixture-of-Experts Training with Hybrid Expert…
…expanding their context windows, with recent models supporting sequences of 128K tokens, 256K tokens, and beyond.... 9 MIN READ Feb 02, 2026 Optimizing Communication for Mixture-of-Experts Training with Hybrid Expert…
…expanding their context windows, with recent models supporting sequences of 128K tokens, 256K tokens, and beyond.... 9 MIN READ Feb 02, 2026 Optimizing Communication for Mixture-of-Experts Training with Hybrid Expert…
…expanding their context windows, with recent models supporting sequences of 128K tokens, 256K tokens, and beyond.... 9 MIN READ Feb 02, 2026 Optimizing Communication for Mixture-of-Experts Training with Hybrid Expert…
…expanding their context windows, with recent models supporting sequences of 128K tokens, 256K tokens, and beyond.... 9 MIN READ Feb 02, 2026 Optimizing Communication for Mixture-of-Experts Training with Hybrid Expert…
…expanding their context windows, with recent models supporting sequences of 128K tokens, 256K tokens, and beyond.... 9 MIN READ Feb 02, 2026 Optimizing Communication for Mixture-of-Experts Training with Hybrid Expert…
…expanding their context windows, with recent models supporting sequences of 128K tokens, 256K tokens, and beyond.... 9 MIN READ Feb 02, 2026 Optimizing Communication for Mixture-of-Experts Training with Hybrid Expert…
…expanding their context windows, with recent models supporting sequences of 128K tokens, 256K tokens, and beyond.... 9 MIN READ Feb 02, 2026 Optimizing Communication for Mixture-of-Experts Training with Hybrid Expert…
…expanding their context windows, with recent models supporting sequences of 128K tokens, 256K tokens, and beyond.... 9 MIN READ Feb 02, 2026 Optimizing Communication for Mixture-of-Experts Training with Hybrid Expert…