Search: kernel hardware requirements

Optimizing Communication for Mixture-of-Experts Training with Hybrid Expert Parallel | NVIDIA Technical Blog

…Today’s MoE models impose higher and more complex requirements for parallel strategies, low-precision computing, and dynamic resource scheduling. They also need optimization to maximize the potential of next-generation hardware…

Feb 2, 2026 · Fan Yu

Delivering Lifecycle Control for AI Infrastructure at Scale with NVIDIA DGX Spark Enterprise Manageability | NVIDIA Technical Blog

…This includes GPU telemetry, kernel logs, hardware events, PCIe state, firmware information, and crash diagnostics. The bundle is produced as an artifact on-device; the tool returns a pointer through stdout so…

Jun 9, 2026 · Maitri Taneja

Deploy Agentic-Ready AI at the Edge with Memory Efficiency in NVIDIA JetPack 7.2 | NVIDIA Technical Blog

…This includes configuring I/Os, clock settings, fan control, power profiles, or any other module for a specific hardware design. Tasks that previously required weeks of manual effort can be handled by…

Jun 2, 2026 · Peilun Tsai

Building a Zero-Trust Architecture for Confidential AI Factories | NVIDIA Technical Blog

…Confidential Containers (CoCo) operationalize it for Kubernetes. CoCo enables Kubernetes pods to run inside hardware-backed TEEs without requiring application rewrites. Instead of sharing the host kernel, each pod is transparently wrapped…

Mar 23, 2026 · Hema Bontha

Tuning Flash Attention for Peak Performance in NVIDIA CUDA Tile | NVIDIA Technical Blog

…Defining the kernel interface In cuTile, the @ct.kernel decorator marks a Python function as a GPU kernel. We pass compile-time constants using ct.Constant[T] type annotations: import math import…

Mar 5, 2026 · Alessandro Morari

Enhancing Distributed Inference Performance with the NVIDIA Inference Transfer Library | NVIDIA Technical Blog

…Due to the requirement for ultra-low-latency communication for intermediate activations between stages, these transfers are typically initiated by the GPU through optimized kernels, referred to as device side APIs for…

Mar 9, 2026 · Seonghee Lee

Accelerating Long-Context Inference with Skip Softmax in NVIDIA TensorRT LLM | NVIDIA Technical Blog

…This post explains a technique known as Skip Softmax, a hardware-friendly, drop-in sparse attention method that accelerates inference without any retraining. Read on to learn how Skip Softmax delivers up…

Dec 16, 2025 · Laikh Tewari

CUDA-X

…CUTLASS Modular C++ templates and Python DSLs for building high-performance kernels targeting NVIDIA Tensor Cores. FlashInfer GPU-accelerated kernel library, accessible via Python API for inference, optimizing attention, MoEs, GEMMs, comms…

Using Accelerated Computing to Live-Steer Scientific Experiments at Massive Research Facilities | NVIDIA Technical Blog

…Live data processing—to promptly send alerts to telescopes around the world and steer observation decisions—requires accelerated computing. These steps require advanced image calibration, basis constructions, convolutions, subpixel differencing, pattern extraction…

Feb 10, 2026 · Quynh L. Nguyen

NVIDIA IGX Thor Powers Industrial, Medical, and Robotics Edge AI Applications | NVIDIA Technical Blog

…The software stacks are also aligned—kernel, user space, and AI libraries share the same versions—delivering a consistent experience across Jetson and IGX. For teams with deeper customization requirements, NVIDIA is…

Mar 23, 2026 · Suhas Hariharapura Sheshadri

Followed topics