OSMO Platform
…Do I need Kubernetes or infrastructure expertise to use OSMO? No. Workflows are defined in simple YAML files, and OSMO abstracts the underlying infrastructure. Users don’t need to write Kubernetes manifests…
Tracked topic
The operational payoff of running Slurm on Kubernetes comes from the ecosystem. Rather than building and maintaining separate toolchains for GPU management, monitoring, networking, and node lifecycle, you can use the Kubernetes tooling that already exists for these problems. Platform teams manage clusters with declarative YAML, Helm deployments, rolling updates, and Prometheus or Grafana for observability.
Running Large-Scale GPU Workloads on Kubernetes with Slurm | NVIDIA Technical BlogNVSentinel is installed in each Kubernetes cluster run. Once deployed, NVSentinel continuously watches nodes for errors, analyzes events, and takes automated actions such as quarantining, draining, labeling, or triggering external remediation workflows. Specific NVSentinel features include continuous monitoring, data aggregation and analysis, and more, as detailed below.
Automate Kubernetes AI Cluster Health with NVSentinel | NVIDIA Technical BlogSlinky slurm-operator represents each Slurm component (slurmctld for scheduling, slurmdbd for accounting, slurmd for compute workers, slurmrestd for API access) as a Kubernetes Custom Resource Definition (CRD). A Slurm cluster is defined using Custom Resources, and Slinky creates containerized Slurm daemons running in their own pods, configured to belong to their respective cluster. Slinky ensures high availability (HA) of the Slurm control plane (slurmctld) through pod regeneration, with no need for the Slurm native HA mechanism. Configuration changes propagate automatically: Kubernetes synch
Running Large-Scale GPU Workloads on Kubernetes with Slurm | NVIDIA Technical BlogBefore diving into Kubernetes manifests, it helps to understand the two inference deployment modes for LLMs: In aggregated serving, a single process (or tightly coupled group of processes) handles the entire inference lifecycle from input to output. Disaggregated serving splits the pipeline into distinct stages such as prefill, decode, and routing, each running as independent services (see Figure 1, below).
Deploying Disaggregated LLM Inference Workloads on Kubernetes | NVIDIA Technical Blog…Do I need Kubernetes or infrastructure expertise to use OSMO? No. Workflows are defined in simple YAML files, and OSMO abstracts the underlying infrastructure. Users don’t need to write Kubernetes manifests…
…This was deployed in a cluster of nodes managed by Kubernetes. To learn more, see NVIDIA NIM LLM with NVIDIA Run:ai and Vanilla Kubernetes for Enterprise RA . Infrastructure Identical benchmarks were…
…A modular component of Dynamo that simplifies deploying hierarchical gang-scheduled and topology‑aware AI workloads on Kubernetes AI Perf : a comprehensive benchmarking tool that measures the performance of generative AI models…
…Learn more In production Kubernetes environments, the difference between model requirements and GPU size creates inefficiencies. Lightweight automatic speech recognition (ASR) or text-to-speech (TTS) models may require only 10 GB…
…NVIDIA Nsight Streamer improvements now available on NGC for viewing reports on remote headless servers Docker Kubernetes NVIDIA Nsight Operator for Kubernetes improvements - Releasing soon on NGC Learn more here and apply…
NVIDIA Cloud Functions (NVCF) NVIDIA Cloud Functions (NVCF) is a unified API layer for running and scaling inference, fine-tuning, batch, and simulation workloads across Kubernetes clusters. It seamlessly integrates into NVIDIA…
…Integrated domain power service elevates power management to a first-class scheduling primitive, enabling power-aware workload placement and optimization across Slurm and Kubernetes environments using Run:ai, supporting MAX-P/MAX…
…Customization of Qwen3.5 for domain-specific tasks is facilitated by the NVIDIA NeMo framework, which provides PyTorch-native high-throughput fine-tuning, Slurm and Kubernetes multinode support, and Hugging Face integration…
…For enterprise Kubernetes deployments, the SDK documentation includes an NGINX Ingress configuration that supports multiple CloudXR servers with load balancing. Ensure your firewall allows TCP port 49100 (signaling), UDP port 47998 (media…
…Integrating AIConfigurator in the AI Serving Stack for automated deployments The AI Serving Stack , built on the Alibaba Container Service for Kubernetes (ACK), is an end-to-end solution for efficient and…