Search

Showing top 66 results for "first-party performance"

People also ask

How does Slurm block scheduling optimize performance?

An important subtlety that often surprises users is the fact that Slurm can assign multiple segments of the same job to the same block. Using segments is essential for optimizing performance based on the specific locality requirements of the workload: Tensor Parallelism (TP) may require small, tight segments to keep latency-sensitive communication on the high-speed NVLink fabric, while Expert Parallelism (EP) may require larger segment sizes to enforce that all-to-all collective operations will always be performed within a single NVLink domain. Using a large segment value such as --segment=16

Achieving Peak System and Workload Efficiency on NVIDIA GB200 NVL72 with Slurm Block Scheduling | NVIDIA Technical Blog

To show you the most relevant results, we’ve omitted some entries very similar to those already shown. Repeat the search with the omitted results included.