DeepStream SDK
…With the DeepStream Container Builder and NGC containers, you can easily create scalable, high-performance AI applications managed with Kubernetes and Helm. DeepStream REST-APIs also let you manage multiple parameters at…
Tracked topic
The operational payoff of running Slurm on Kubernetes comes from the ecosystem. Rather than building and maintaining separate toolchains for GPU management, monitoring, networking, and node lifecycle, you can use the Kubernetes tooling that already exists for these problems. Platform teams manage clusters with declarative YAML, Helm deployments, rolling updates, and Prometheus or Grafana for observability.
Running Large-Scale GPU Workloads on Kubernetes with Slurm | NVIDIA Technical BlogSlinky slurm-operator represents each Slurm component (slurmctld for scheduling, slurmdbd for accounting, slurmd for compute workers, slurmrestd for API access) as a Kubernetes Custom Resource Definition (CRD). A Slurm cluster is defined using Custom Resources, and Slinky creates containerized Slurm daemons running in their own pods, configured to belong to their respective cluster. Slinky ensures high availability (HA) of the Slurm control plane (slurmctld) through pod regeneration, with no need for the Slurm native HA mechanism. Configuration changes propagate automatically: Kubernetes synch
Running Large-Scale GPU Workloads on Kubernetes with Slurm | NVIDIA Technical BlogThe GPU Usage Monitor is an open-source project that deploys a fully integrated GPU observability stack for Kubernetes. Rather than requiring SRE and platform teams to assemble and configure individual components, the GPU Usage Monitor uses DCGM Exporter, kube-state-metrics, Prometheus, and Grafana into a single deployment, complete with pre-built dashboards designed specifically for GPU-accelerated workloads. The design principle is operational simplicity. A single helm install command results in actionable GPU visibility within minutes, with no custom dashboard authoring or scrape configurat
Get Real-Time Visibility into GPU Usage Across Kubernetes Clusters | NVIDIA Technical BlogNVSentinel is installed in each Kubernetes cluster run. Once deployed, NVSentinel continuously watches nodes for errors, analyzes events, and takes automated actions such as quarantining, draining, labeling, or triggering external remediation workflows. Specific NVSentinel features include continuous monitoring, data aggregation and analysis, and more, as detailed below.
Automate Kubernetes AI Cluster Health with NVSentinel | NVIDIA Technical Blog…With the DeepStream Container Builder and NGC containers, you can easily create scalable, high-performance AI applications managed with Kubernetes and Helm. DeepStream REST-APIs also let you manage multiple parameters at…
…Nsight Compute is included in the CUDA Toolkit and is available as a standalone download. NVIDIA Nsight Cloud includes updates to the Nsight Operator for Kubernetes along with Nsight Streamer Kubernetes and…
…Before NVIDIA, he built and scaled cloud-native platforms at UiPath, Omnitracs (SmartDrive), and SAP, spanning Kubernetes controllers, GitOps architecture, CI/CD standardization, and production reliability across on-prem and cloud environments…
…17 MIN READ Mar 23, 2026 Deploying Disaggregated LLM Inference Workloads on Kubernetes As large language model (LLM) inference workloads grow in complexity, a single monolithic serving process starts to hit its…
…17 MIN READ Mar 23, 2026 Deploying Disaggregated LLM Inference Workloads on Kubernetes As large language model (LLM) inference workloads grow in complexity, a single monolithic serving process starts to hit its…
…17 MIN READ Mar 23, 2026 Deploying Disaggregated LLM Inference Workloads on Kubernetes As large language model (LLM) inference workloads grow in complexity, a single monolithic serving process starts to hit its…
…17 MIN READ Mar 23, 2026 Deploying Disaggregated LLM Inference Workloads on Kubernetes As large language model (LLM) inference workloads grow in complexity, a single monolithic serving process starts to hit its…
…17 MIN READ Mar 23, 2026 Deploying Disaggregated LLM Inference Workloads on Kubernetes As large language model (LLM) inference workloads grow in complexity, a single monolithic serving process starts to hit its…
…17 MIN READ Mar 23, 2026 Deploying Disaggregated LLM Inference Workloads on Kubernetes As large language model (LLM) inference workloads grow in complexity, a single monolithic serving process starts to hit its…
…17 MIN READ Mar 23, 2026 Deploying Disaggregated LLM Inference Workloads on Kubernetes As large language model (LLM) inference workloads grow in complexity, a single monolithic serving process starts to hit its…