Search

Showing top 116 results for "AI costs & tokens"

All sources blogs.nvidia.com 32 developer.nvidia.com 18 theregister.com 11 huggingface.co 9 techcrunch.com 5 pcworld.com 4 amd.com 3 nextplatform.com 3 xda-developers.com 3 tomshardware.com 2 engadget.com 2 press.asus.com 2

People also ask

What Is InferenceMAX v1 and Why Does It Matter for AI Economics?

InferenceMAX v1, a new benchmark from SemiAnalysis released Monday, is the latest to highlight Blackwell’s inference leadership. It runs popular models across leading platforms, measures performance for a wide range of use cases and publishes results anyone can verify. Why do benchmarks like this matter? Because modern AI isn’t just about raw speed — it’s about efficiency and economics at scale. As models shift from one-shot replies to multistep reasoning and tool use, they generate far more tokens per query, dramatically increasing compute demands. NVIDIA’s open-source collaborations with Ope

Telecommunications Archives

How Is AI Shifting from Pilots to AI Factories and What’s Next?

AI is moving from pilots to AI factories — infrastructure that manufactures intelligence by turning data into tokens and decisions in real time. Open, frequently updated benchmarks help teams make informed platform choices, tune for cost per token, latency service-level agreements and utilization across changing workloads. Learn more about how to calculate lowest cost per token and how the NVIDIA Think SMART framework drives cost efficient inference.

Telecommunications Archives

What Are the Factors That Lower Token Cost?

Understanding how to optimize token cost requires looking at the equation for calculating cost per million tokens. In this equation, many enterprises evaluating AI infrastructure focus on the numerator: the cost per GPU per hour. For cloud deployments, this is the hourly rate paid to a cloud provider; for on-premises deployments, it’s the effective hourly cost derived from amortizing owned infrastructure. The real key to reducing token cost, however, lies in the denominator: maximizing the delivered token output. That denominator carries two business implications. Minimize token cost: When thi

Rethinking AI TCO: Why Cost per Token Is the Only Metric That Matters

How Does Blackwell Achieve 15x Lower Cost Per Token and 10x Higher Efficiency?

Metrics like tokens per watt, cost per million tokens and TPS/user matter as much as throughput. In fact, for power-limited AI factories, Blackwell delivers 10x throughput per megawatt for mixture-of-experts models compared with the previous generation, which translates into higher token revenue. The cost per token is crucial for evaluating AI model efficiency, directly impacting operational expenses. The NVIDIA Blackwell architecture lowered cost per million tokens by 15x versus the previous generation, leading to substantial savings and fostering wider AI deployment and innovation.

Telecommunications Archives

Videos

Scaling Token Factory Revenue and AI Efficiency by Maximizing Performance per Watt | NVIDIA Technical Blog

…Translating efficiency into tokens As tokens per watt increase, more billable AI work fits within a fixed power envelope, lowering cost per token and expanding margins. Realizing those gains requires closing the…

Mar 25, 2026 · Kibibi Moseley

Paper page - Position: LLM Inference Should Be Evaluated as Energy-to-Token Production

…Xiang Liu , , , , , , , Abstract LLM inference should be evaluated as energy-to-token production under constraints of compute, power, cooling, and operational efficiency, requiring new metrics beyond traditional accuracy and latency measures. AI…

May 14, 2026

Paper page - DiffRetriever: Parallel Representative Tokens for Retrieval with Diffusion Language Models

…Encoding cost stays roughly constant in K instead of scaling with it. Findings. Multi-token helps every diffusion backbone we test, on every benchmark (MS MARCO, TREC DL'19/'20, BEIR-7…

May 12, 2026

High-VRAM GPUs aren't the future of local AI — unified memory and Mixture of Experts models are

…There's also a routing step that costs you a little per token, and the irregular token-by-token routing hurts memory locality in a way that a single user (where you…

May 26, 2026 · Adam Conway

Solving the Agentic AI Trilemma – Cost, Scale, and Data Security

…Reducing Cloud Compute Token Costs for Enterprises When testing SuperClaw versus cloud-only agentic AI solutions, SuperClaw demonstrated up to 70% reduction in average cloud compute token consumption running relevant enterprise workloads…

May 21, 2026

H100 vs GB200 NVL72 Training Benchmarks - Power, TCO, and Reliability Analysis, Software Improvement Over Time

…Million Tokens, MFU, Tokens Per US Annual Household Energy Usage, DeepSeek 670B, GB200 Unreliability, Backplane Downtime Frontier model training has pushed GPUs and AI systems to their absolute limits, making cost, efficiency…

Aug 20, 2025 · Dylan Patel

China’s OpenClaw Boom Is a Gold Rush for AI Companies

…Token Costs Most nontechnical users of OpenClaw have computers that are neither compatible with OpenClaw’s working environment nor powerful enough to run AI models locally, so they have to rent cloud…

Mar 13, 2026 · Zeyi Yang

Discussions and forums

Hacker News · u/tinyopsstudio · 2d ago

Show HN: AI agent token cost calculator for Codex and Claude Code loops

Hacker News · u/Robelkidin · 3w ago

Show HN: Token Usage Meter 12 Providers and Coding Agent

Here once again A Token Usage Meter for 12+ AI Providers Anthropic, OpenAI, Google, Alibaba qween, Moonshot Kimi, MiniMax, ElevenLabs, Deepgram, Perplexity. Qlaud.ai provides token usage meter / AI billing layer. Also Ql…

r/openai · u/VegetablePen4755 · 4d ago

DeepSeek just popped the American AI bubble.

DeepSeek just popped the American AI bubble. Not by killing AI. By killing the fantasy of unlimited AI pricing power. DeepSeek V4 Pro: Input: $0.435 per 1M tokens Output: $0.87 per 1M tokens OpenAI GPT-5.5: Input: $5.00 …

Hacker News · u/AdarshRao23 · 2w ago

Show HN: Torrix, self hosted, LLM Observability,(no Postgres, no Redis)

I work as a SAP Integration consultant and built this as a side project. Friction point: Most self hosted LLM observability tools require Postgres, Redis and non trivial infrastructure. Teams just want to see what their …

72 4

Hacker News · u/cinooo · 4w ago

Followed topics

Search

People also ask

Videos

Scaling Token Factory Revenue and AI Efficiency by Maximizing Performance per Watt | NVIDIA Technical Blog

Paper page - Position: LLM Inference Should Be Evaluated as Energy-to-Token Production

Paper page - DiffRetriever: Parallel Representative Tokens for Retrieval with Diffusion Language Models

High-VRAM GPUs aren't the future of local AI — unified memory and Mixture of Experts models are

Top stories

ASUS Takes the Lead in Hybrid Agentic AI Infrastructure- Maximizing Performance While Reducing Inference Costs

Dell Launches Local ‘Deskside Agentic AI’ Workstations to Slash Cloud Token Costs

OpenClaw creator reveals he used over $1,300,000 of OpenAI tokens in a month

OpenClaw creator burned through $1.3 million in OpenAI API tokens in a single month — bill covered 603 billion tokens across 7.6 million requests and 100 coding agents

Solving the Agentic AI Trilemma – Cost, Scale, and Data Security

H100 vs GB200 NVL72 Training Benchmarks - Power, TCO, and Reliability Analysis, Software Improvement Over Time

China’s OpenClaw Boom Is a Gold Rush for AI Companies

Discussions and forums

Show HN: AI agent token cost calculator for Codex and Claude Code loops

Show HN: Token Usage Meter 12 Providers and Coding Agent

DeepSeek just popped the American AI bubble.

Show HN: Torrix, self hosted, LLM Observability,(no Postgres, no Redis)

What I changed in how I use Claude Code after Anthropic's postmortem

New NVIDIA Nemotron 3 Super Delivers 5x Higher Throughput for Agentic AI

The Rise of AI Factories: Scaling Intelligence Across Industries

Paper page - MISA: Mixture of Indexer Sparse Attention for Long-Context LLM Inference