Search

Showing top 68 results for "AI cost and performance"

People also ask

How Did NVIDIA Double Blackwell Performance Through Continuous Software Optimizations to Lower Token Cost?

NVIDIA doubled Blackwell performance through continuous software optimization, refining kernels, compiler paths, and inference runtimes so the same hardware delivers significantly more useful AI throughput over time. Initial gpt-oss-120b performance on an NVIDIA DGX Blackwell B200 system with the NVIDIA TensorRT LLM library was market-leading, but NVIDIA’s teams and the community have significantly optimized TensorRT LLM for open-source large language models. The TensorRT LLM v1.0 release is a major breakthrough in making large AI models faster and more responsive for everyone. Through advance

Telecommunications Archives

What Is InferenceMAX v1 and Why Does It Matter for AI Economics?

InferenceMAX v1, a new benchmark from SemiAnalysis released Monday, is the latest to highlight Blackwell’s inference leadership. It runs popular models across leading platforms, measures performance for a wide range of use cases and publishes results anyone can verify. Why do benchmarks like this matter? Because modern AI isn’t just about raw speed — it’s about efficiency and economics at scale. As models shift from one-shot replies to multistep reasoning and tool use, they generate far more tokens per query, dramatically increasing compute demands. NVIDIA’s open-source collaborations with Ope

Telecommunications Archives

How Does Blackwell Achieve 15x Lower Cost Per Token and 10x Higher Efficiency?

Metrics like tokens per watt, cost per million tokens and TPS/user matter as much as throughput. In fact, for power-limited AI factories, Blackwell delivers 10x throughput per megawatt for mixture-of-experts models compared with the previous generation, which translates into higher token revenue. The cost per token is crucial for evaluating AI model efficiency, directly impacting operational expenses. The NVIDIA Blackwell architecture lowered cost per million tokens by 15x versus the previous generation, leading to substantial savings and fostering wider AI deployment and innovation.

Telecommunications Archives

How Does Blackwell Balance Cost, Throughput, Efficiency and Responsiveness?

InferenceMAX uses the Pareto frontier — a curve that shows the best trade-offs between different factors, such as data center throughput and responsiveness — to map performance. But it’s more than a chart. It reflects how NVIDIA Blackwell balances the full spectrum of production priorities: cost, energy efficiency, throughput and responsiveness. That balance enables the highest ROI across real-world workloads. Systems that optimize for just one mode or scenario may show peak performance in isolation, but the economics of that doesn’t scale. Blackwell’s full-stack design delivers efficiency and

Telecommunications Archives

Followed topics

Search

People also ask

NVIDIA GTC Taipei at COMPUTEX: Live Updates on What’s Next in AI

Efficiency at Scale: NVIDIA, Energy Leaders Accelerating Power‑Flexible AI Factories to Fortify the Grid

Fastest, Largest, Strongest: NVIDIA Blackwell Sweeps MLPerf Training 6.0

NVIDIA and ComfyUI Streamline Local AI Video Generation for Game Developers and Creators at GDC

Snap Decisions: How Open Libraries for Accelerated Data Processing Boost A/B Testing for Snapchat

National Robotics Week — Latest Physical AI Research, Breakthroughs and Resources

NVIDIA DGX Spark and DGX Station Power the Latest Open-Source and Frontier Models From the Desktop

NVIDIA Factory Operations Blueprint Gives Factories a New AI Brain

What Are AI Tokens? The Language and Currency Powering Modern AI

Why Financial Institutions Are Converging on Transaction Foundation Models to Build Their Own Intelligence