Optimizing Communication for Mixture-of-Experts Training with Hybrid Expert Parallel | NVIDIA Technical Blog
…Tong Liu Tong Liu is a DevTech engineer at NVIDIA, specializing in optimizing Mixture-of-Experts (MoE) large language model training and CUDA kernel development. He has contributed to key features in…
