Search

Showing top 66 results for "first-party performance"

People also ask

How does Slurm block scheduling optimize performance?

An important subtlety that often surprises users is the fact that Slurm can assign multiple segments of the same job to the same block. Using segments is essential for optimizing performance based on the specific locality requirements of the workload: Tensor Parallelism (TP) may require small, tight segments to keep latency-sensitive communication on the high-speed NVLink fabric, while Expert Parallelism (EP) may require larger segment sizes to enforce that all-to-all collective operations will always be performed within a single NVLink domain. Using a large segment value such as --segment=16

Achieving Peak System and Workload Efficiency on NVIDIA GB200 NVL72 with Slurm Block Scheduling | NVIDIA Technical Blog

Followed topics

Search

People also ask

Cut Checkpoint Costs with About 30 Lines of Python and NVIDIA nvCOMP | NVIDIA Technical Blog

How NVIDIA Dynamo 1.0 Powers Multi-Node Inference at Production Scale | NVIDIA Technical Blog

Mitigating Indirect AGENTS.md Injection Attacks in Agentic Environments | NVIDIA Technical Blog

Building Autonomous Vehicles That Reason with NVIDIA Alpamayo | NVIDIA Technical Blog

Revolutionizing AI-Driven Material Discovery Using NVIDIA ALCHEMI | NVIDIA Technical Blog

MDL SDK