Search

Showing top 40 results for "Kubernetes"

Kubernetes

58 articles indexed Last updated 9h ago See topic hub

People also ask

What is the benefit of running Slurm on Kubernetes?

The operational payoff of running Slurm on Kubernetes comes from the ecosystem. Rather than building and maintaining separate toolchains for GPU management, monitoring, networking, and node lifecycle, you can use the Kubernetes tooling that already exists for these problems. Platform teams manage clusters with declarative YAML, Helm deployments, rolling updates, and Prometheus or Grafana for observability.

Running Large-Scale GPU Workloads on Kubernetes with Slurm | NVIDIA Technical Blog

How does NVSentinel work?

NVSentinel is installed in each Kubernetes cluster run. Once deployed, NVSentinel continuously watches nodes for errors, analyzes events, and takes automated actions such as quarantining, draining, labeling, or triggering external remediation workflows. Specific NVSentinel features include continuous monitoring, data aggregation and analysis, and more, as detailed below.

Automate Kubernetes AI Cluster Health with NVSentinel | NVIDIA Technical Blog

How does Slinky slurm-operator work?

Slinky slurm-operator represents each Slurm component (slurmctld for scheduling, slurmdbd for accounting, slurmd for compute workers, slurmrestd for API access) as a Kubernetes Custom Resource Definition (CRD). A Slurm cluster is defined using Custom Resources, and Slinky creates containerized Slurm daemons running in their own pods, configured to belong to their respective cluster. Slinky ensures high availability (HA) of the Slurm control plane (slurmctld) through pod regeneration, with no need for the Slurm native HA mechanism. Configuration changes propagate automatically: Kubernetes synch

Running Large-Scale GPU Workloads on Kubernetes with Slurm | NVIDIA Technical Blog

How do aggregated and disaggregated inference differ?

Before diving into Kubernetes manifests, it helps to understand the two inference deployment modes for LLMs: In aggregated serving, a single process (or tightly coupled group of processes) handles the entire inference lifecycle from input to output. Disaggregated serving splits the pipeline into distinct stages such as prefill, decode, and routing, each running as independent services (see Figure 1, below).

Deploying Disaggregated LLM Inference Workloads on Kubernetes | NVIDIA Technical Blog

Maximizing GPU Utilization with NVIDIA Run:ai and NVIDIA NIM | NVIDIA Technical Blog

…NVIDIA Run:ai’s dynamic GPU fractions solve this by replacing fixed allocations with a request/limit model, borrowing Kubernetes resource semantics for GPU memory: Request: The guaranteed minimum fraction, always reserved…

Feb 27, 2026 · Shwetha Krishnamurthy

NVIDIA Omniverse ライブラリを活用した、フィジカル AI 機能の既存アプリへの統合

…がアプリケーションにレンダリングと物理機能を提供していますが、 ovstorage は統合ストレージ層として機能します。統合された API 層を介して、PLM または既存のリポジトリを Omniverse エコシステムに直接接続します。これにより、同期ジョブとコストのかかるデータ移行が不要になり、ファイルを移動することなく USD ワークフローが可能になります。 Kubernetes 対応のヘッドレスデプロイ向けに設計された ovstorage は、アーキテクチャ全体を制御して、マイクロサービスを独立してスケーリングし、モノリシックなレガシースタックの制約を受けずに、本番環境の需要を満たします。始め方既存のインフラを統合する: Omniverse を現在のストレージバックエンド (S3 または…

Apr 8, 2026 · Ashley Goldstein

Followed topics

Search

Kubernetes

People also ask

Maximizing GPU Utilization with NVIDIA Run:ai and NVIDIA NIM | NVIDIA Technical Blog

NVIDIA Omniverse ライブラリを活用した、フィジカル AI 機能の既存アプリへの統合

CUDA 13.2 Introduces Enhanced CUDA Tile Support and New Python Features | NVIDIA Technical Blog

DeepStream SDK

Real-Time Performance Monitoring and Faster Debugging with NCCL Inspector and Prometheus | NVIDIA Technical Blog

NVIDIA Technical Blog

Robotics – NVIDIA Technical Blog

AR / VR – NVIDIA Technical Blog

Data Science – NVIDIA Technical Blog

Edge Computing – NVIDIA Technical Blog