Amazon SageMaker HyperPod Slurm clusters now support specifying minimum capacity requirements with continuous provisioning - AWS
… This is particularly useful for distributed training workloads using frameworks such as PyTorch FSDP, Megatron-LM, or NVIDIA NeMo, where training jobs are commonly configured with a fixed number of participating nodes and may not start efficiently or correctly with partial cluster capacity. …