Optimizing Communication for Mixture-of-Experts Training with Hybrid Expert Parallel | NVIDIA Technical Blog
…Multidimensional parallelism strategies with support for tensor parallelism (TP), sequence parallelism, pipeline parallelism (PP), MoE expert parallelism (EP), and other strategies that can be flexibly combined to accommodate diverse and complex training…