Search

Showing top 100 results for "performance benchmarking"

People also ask

What metrics should you measure for LLM inference performance?

The prerequisite for sizing and TCO estimation is benchmarking the performance of each deployment unit, e.g., an inference server. The goal of this step is to measure the throughput a system can produce under load, and at what latency. These throughput and latency metrics, together with quality of service requirements (e.g., max latency) and expected peak demand (e.g., max concurrent users or requests per second), will help estimate the required hardware, such as sizing the deployment. In turn, sizing information is a prerequisite for estimating the total cost of ownership (TCO) of the given s

LLM Inference Benchmarking: How Much Does Your LLM Inference Cost? | NVIDIA Technical Blog

How do latency-throughput trade-offs affect deployment optimization?

Once raw benchmark data are collected, they are analyzed to gain insight into the various performance characteristics of the system. Read our LLM inference benchmarking guide, where we gather NIM performance data with GenAI-perf and use a simple Python script to analyze the data. For example, ‌performance data provided by GenAI-perf can be used to establish the latency-throughput trade-off curve, shown in Figure 1. Each dot on this graph corresponds to a “concurrency” level, that is, the number of concurrent requests being put into the system at any given time throughout the benchmark process

LLM Inference Benchmarking: How Much Does Your LLM Inference Cost? | NVIDIA Technical Blog

모델 양자화: NVIDIA Model Optimizer로 구현하는 학습 후 양자화(PTQ)

…CLIP_benchmark 중 다음 세 가지 cifar100 (제로샷 분류) imagenet1k (제로샷 분류) mscoco_captions (제로샷 검색) ModelOpt로 PTQ 실행하기 다음 코드 샘플은 ModelOpt를 사용해 CLIP 모델을 FP8로 PTQ 처리하는 방법을 보여줍니다. import…

May 20, 2026 · Ruixiang Wang

R²D²: Scaling Multimodal Robot Learning with NVIDIA Isaac Lab | NVIDIA Technical Blog

…Delivering GPU-accelerated performance at scale Isaac Lab delivers the massive throughput required for modern robot learning, achieving 135,000 FPS for humanoid locomotion (Unitree H1) and over 150,000 FPS for…

Feb 10, 2026 · Oyindamola Omotuyi

Designing Protein Binders Using the Generative Model Proteina-Complexa | NVIDIA Technical Blog

…Key technologies in Proteina-Complexa Proteina-Complexa performance relies on three distinct technical components: the base generative model, the training datasets, and the integration of inference-time compute scaling. Built on top…

Mar 25, 2026 · Kyle Gion

Building Autonomous Vehicles That Reason with NVIDIA Alpamayo | NVIDIA Technical Blog

…Eventually, labs can seamlessly plug in their own driving, rendering, or traffic models, and compare approaches directly on shared benchmarks. AlpaSim in action AlpaSim is already powering several of our internal research…

Jan 5, 2026 · Marco Pavone

Full-Stack Optimizations for Agentic Inference with NVIDIA Dynamo | NVIDIA Technical Blog

…This lets us benchmark our backend implementations against closed-source inference, targeting parity on cache reuse performance. We will be sharing a full write-up and some optimized recipes for deploying both…

Apr 17, 2026 · Ishan Dhanani

Inside the NVIDIA Vera Rubin Platform: Six New Chips, One AI Supercomputer | NVIDIA Technical Blog

…It ensures that performance and efficiency hold up in production deployments, not just isolated component benchmarks. This technical deep dive explains why AI factories demand a new architectural approach; how NVIDIA Vera…

Jan 5, 2026 · Kyle Aubrey

NVIDIA Dynamo Snapshot: Fast Startup for Inference Workloads on Kubernetes | NVIDIA Technical Blog

…It also gives us tighter control over CRIU for performance tuning and allows checkpoint artifacts to live in flexible storage backends instead of being embedded into OCI images. Dynamo Snapshot: The workload…

May 27, 2026 · Schwinn Saereesitthipitak

Build Accelerated, Differentiable Computational Physics Code for AI with NVIDIA Warp | NVIDIA Technical Blog

…Warp enables developers to write high-performance kernels as regular Python functions that are JIT-compiled into efficient code for execution on the GPU. Unlike the tensor-based frameworks, in which developers…

Mar 12, 2026 · Sheel Nidhan

Jetson FAQ

…Jetson AGX Orin delivers leading performance in the MLPerf Benchmark for generative AI at the embedded edge. To explore more, please visit the NVIDIA Jetson AI Lab . Where can I buy Jetson…

NVIDIA AI-Q 및 LangChain을 활용한 기업용 검색 딥 에이전트 구축 가이드

…Architectures"} ] }, { "id": "2", "title": "Performance and Accuracy Trade-offs", "subsections": [ {"id": "2.1", "title": "Factual Accuracy and Hallucination Rates"}, {"id": "2.2", "title": "Latency and Throughput Benchmarks"} ] } ], "queries": [ { "id": "q1", "query": "RAG…

Mar 25, 2026 · Sean Lopp

To show you the most relevant results, we’ve omitted some entries very similar to those already shown. Repeat the search with the omitted results included.

‹ Prev 1 2 3 4 5 6 7 8 9 10

Followed topics