NVIDIA Blog
…AI Grids to Optimize Inference on Distributed Networks March 17, 2026 New SemiAnalysis InferenceX Data Shows NVIDIA Blackwell Ultra Delivers up to 50x Better Performance and 35x Lower Costs for Agentic AI…
NVIDIA doubled Blackwell performance through continuous software optimization, refining kernels, compiler paths, and inference runtimes so the same hardware delivers significantly more useful AI throughput over time. Initial gpt-oss-120b performance on an NVIDIA DGX Blackwell B200 system with the NVIDIA TensorRT LLM library was market-leading, but NVIDIA’s teams and the community have significantly optimized TensorRT LLM for open-source large language models. The TensorRT LLM v1.0 release is a major breakthrough in making large AI models faster and more responsive for everyone. Through advance
Telecommunications ArchivesInferenceMAX v1, a new benchmark from SemiAnalysis released Monday, is the latest to highlight Blackwell’s inference leadership. It runs popular models across leading platforms, measures performance for a wide range of use cases and publishes results anyone can verify. Why do benchmarks like this matter? Because modern AI isn’t just about raw speed — it’s about efficiency and economics at scale. As models shift from one-shot replies to multistep reasoning and tool use, they generate far more tokens per query, dramatically increasing compute demands. NVIDIA’s open-source collaborations with Ope
Telecommunications ArchivesMetrics like tokens per watt, cost per million tokens and TPS/user matter as much as throughput. In fact, for power-limited AI factories, Blackwell delivers 10x throughput per megawatt for mixture-of-experts models compared with the previous generation, which translates into higher token revenue. The cost per token is crucial for evaluating AI model efficiency, directly impacting operational expenses. The NVIDIA Blackwell architecture lowered cost per million tokens by 15x versus the previous generation, leading to substantial savings and fostering wider AI deployment and innovation.
Telecommunications ArchivesInferenceMAX uses the Pareto frontier — a curve that shows the best trade-offs between different factors, such as data center throughput and responsiveness — to map performance. But it’s more than a chart. It reflects how NVIDIA Blackwell balances the full spectrum of production priorities: cost, energy efficiency, throughput and responsiveness. That balance enables the highest ROI across real-world workloads. Systems that optimize for just one mode or scenario may show peak performance in isolation, but the economics of that doesn’t scale. Blackwell’s full-stack design delivers efficiency and
Telecommunications Archives…AI Grids to Optimize Inference on Distributed Networks March 17, 2026 New SemiAnalysis InferenceX Data Shows NVIDIA Blackwell Ultra Delivers up to 50x Better Performance and 35x Lower Costs for Agentic AI…
…performance across a range of systems — from data center deployments to NVIDIA RTX-powered PCs and workstations, the NVIDIA DGX Spark personal AI supercomputer and NVIDIA Jetson Orin Nano edge AI modules…
…100x performance for vision AI applications and up to 50x performance for vector databases. Power-Efficient Performance for Enterprise Data Centers For enterprises looking to optimize performance, efficiency and costs, RTX PRO…
…March 17, 2026 New SemiAnalysis InferenceX Data Shows NVIDIA Blackwell Ultra Delivers up to 50x Better Performance and 35x Lower Costs for Agentic AI The NVIDIA Blackwell platform has been widely adopted…
NVIDIA’s AI platform raised the bar for AI training and high performance computing in the latest MLPerf industry benchmarks. Among many new records and milestones, one in generative AI stands out…
…NVIDIA Clara for Healthcare and Life Sciences To lower costs and deliver treatments faster, NVIDIA is launching new Clara AI models that bridge the gap between digital discovery and real-world medicine…
…By embedding NVIDIA Blackwell into EDA, manufacturing and process control, NVIDIA is helping the semiconductor industry deliver the next generation of high-performance chips faster. Learn more about the latest AI advancements…
…But getting there can be complicated — setups take space, hardware takes planning and downloads take time. GeForce NOW removes the barriers and delivers instant access to games, high-performance GeForce RTX power…
…NVIDIA has collaborated with Black Forest Labs to release an FP8 version , optimized for the fastest performance and optimal memory consumption on RTX GPUs. NVIDIA NemoClaw — NVIDIA Optimizations for OpenClaw AI developers…
…AI for Good Healthcare and Life Sciences Inception NVIDIA Isaac Sim Omniverse Enterprise Related News NVIDIA Jetson Brings Agentic AI to the Physical World Agentic AI is getting physical. At COMPUTEX on…