CUDA-X
…cuda.parallel Standardized primitives for distributed and local parallel patterns such as sort, scan, and reduction, optimized for the latest NVIDIA GPU architectures. Data Processing Libraries GPU-accelerated libraries to accelerate data…
In addition to Muon, NVIDIA also supports many other optimizers for the research community to explore, including: The ultimate form of orthogonalized optimizer MOP (Momentum Orthogonalized by Polar decomposition) An advanced SOAP variant that updates eigen basis per step with eigen decomposition plus KL correction in REKLS
Advancing Emerging Optimizers for Accelerated LLM Training with NVIDIA Megatron | NVIDIA Technical BlogTo show you the most relevant results, we’ve omitted some entries very similar to those already shown. Repeat the search with the omitted results included.