Speeding Up Variable-Length Training with Dynamic Context Parallelism and NVIDIA Megatron Core | NVIDIA Technical Blog
…A three-stage scheduler then alternates between workload and memory objectives, increasing CP size for heavier samples as needed. compute and memory balance. Collaboration of cost model, solver, and simulator A complete…