Speeding Up Variable-Length Training with Dynamic Context Parallelism and NVIDIA Megatron Core | NVIDIA Technical Blog
…With this approach, Dynamic-CP support is added to all schedulers by inserting a single wrapper, keeping the original scheduling code largely intact. Broadcasting across pipeline stages and extending packedSeqParams Since num…