Search

Showing top 9 results for "Emulation & performance targets"

People also ask

How do pruning and distillation impact model performance?

Experimental results for pruning and distillation from Qwen3 8B using Model Optimizer show that Qwen3 Depth Pruned 6B model is 30% faster than the Qwen3 4B model, and it also performs better on the MMLU (Massive Multitask Language Understanding) benchmark. Depth pruning was applied to reduce the model from 36 to 24 layers, resulting in a 6B model, using one NVIDIA H100 80 GB HBM3. The Pruned model is distilled from Qwen3-8B using the OptimalScale/ClimbMix data processed from nvidia/ClimbMix pretraining dataset. The experiment uses 25% of the data, which is approximately 90B tokens. Distillatio

Pruning and Distilling LLMs Using NVIDIA TensorRT Model Optimizer | NVIDIA Technical Blog

NVIDIA CUDA

… This flexibility lets developers integrate GPU computing into any layer of their software stack to achieve optimal functionality and performance. …

CUDA 13.2 Introduces Enhanced CUDA Tile Support and New Python Features | NVIDIA Technical Blog

… The benefits of emulation are most apparent in key APIs for QR, LU, and Cholesky factorizations. To more about the latest advances in emulation techniques from NVIDIA, see Unlocking Tensor Core Performance with Floating Point Emulation in cuBLAS . …

Mar 9, 2026 · Jonathan Bentz

Advancing AI Infrastructure for Agentic AI with NVIDIA DOCA In-Silicon Security | NVIDIA Technical Blog

… Through zero-copy memory access techniques, this inspection occurs without disrupting application or AI performance. …

Jun 1, 2026 · Ofir Arkin

Pruning and Distilling LLMs Using NVIDIA TensorRT Model Optimizer | NVIDIA Technical Blog

… How do pruning and distillation impact model performance? …

Oct 7, 2025 · Max Xu

Accelerate Clean, Modular, Nuclear Reactor Design with AI Physics | NVIDIA Technical Blog

… The combination of scalability, extensibility, and optimized performance provided by PhysicsNeMo enables the development of surrogate models that deliver near real-time predictions without sacrificing fidelity. …

Apr 17, 2026 · Mark Hobbs

NVIDIA CUDA 13.3 Enhances GPU Development with Tile Programming in C++, Compiler Autotuning, and Python Updates | NVIDIA Technical Blog

… For performance enthusiasts, the newly launched NVIDIA CompileIQ compiler auto-tuning framework delivers up to a 15% speedup on critical kernels like GEMM and attention. …

May 26, 2026 · Jonathan Bentz

DynoSim: Simulating the Pareto Frontier | NVIDIA Technical Blog

… Around 200 seconds, performance drops sharply, and by 300 seconds the system is stuck behind the traffic burst, with p90 TTFT reaching 242 seconds. This suggests users should optimize cold start time to stay below 200 seconds for best performance. …

May 29, 2026 · Yongming Ding

Inside the NVIDIA Vera Rubin Platform: Six New Chips, One AI Supercomputer | NVIDIA Technical Blog

… More information on how Ozaki FP64 emulation is an effective way to achieve true FP64-level accuracy on low-precision AI hardware while delivering impressive performance gains can be found in our blog on Unlocking Tensor Core Performance with Floating Point Emulation in cuBLAS . …

Jan 5, 2026 · Kyle Aubrey

Vulkan Driver Support

… Possible workarounds: Disable flipping in nvidia-settings uncheck "Allow Flipping" in the "OpenGL Settings" panel Disable UBB run 'nvidia-xconfig --no-ubb' Use a composited desktop Bug fixes NVX multiview per view attributes and geometry passthrough shaders Fix subpass dstSubpass=VK SUBPASS EXTERNA… …

Followed topics

People also ask

NVIDIA CUDA

CUDA 13.2 Introduces Enhanced CUDA Tile Support and New Python Features | NVIDIA Technical Blog

Advancing AI Infrastructure for Agentic AI with NVIDIA DOCA In-Silicon Security | NVIDIA Technical Blog

Pruning and Distilling LLMs Using NVIDIA TensorRT Model Optimizer | NVIDIA Technical Blog

Accelerate Clean, Modular, Nuclear Reactor Design with AI Physics | NVIDIA Technical Blog

NVIDIA CUDA 13.3 Enhances GPU Development with Tile Programming in C++, Compiler Autotuning, and Python Updates | NVIDIA Technical Blog

DynoSim: Simulating the Pareto Frontier | NVIDIA Technical Blog

Inside the NVIDIA Vera Rubin Platform: Six New Chips, One AI Supercomputer | NVIDIA Technical Blog

Vulkan Driver Support