Search

Showing top 116 results for "AI token costs"

All sources blogs.nvidia.com 32 developer.nvidia.com 18 theregister.com 12 huggingface.co 9 pcworld.com 4 techcrunch.com 4 amd.com 3 nextplatform.com 3 techpowerup.com 3 tomshardware.com 2 engadget.com 2 press.asus.com 2

People also ask

What Are the Factors That Lower Token Cost?

Understanding how to optimize token cost requires looking at the equation for calculating cost per million tokens. In this equation, many enterprises evaluating AI infrastructure focus on the numerator: the cost per GPU per hour. For cloud deployments, this is the hourly rate paid to a cloud provider; for on-premises deployments, it’s the effective hourly cost derived from amortizing owned infrastructure. The real key to reducing token cost, however, lies in the denominator: maximizing the delivered token output. That denominator carries two business implications. Minimize token cost: When thi

Rethinking AI TCO: Why Cost per Token Is the Only Metric That Matters

Why Does Cost per Token Matter Much More Than FLOPS per Dollar?

The following data for the DeepSeek-R1 AI model demonstrates the difference between theoretical and actual business outcomes. Looking at compute cost alone, the NVIDIA Blackwell platform appears to cost roughly 2x more than NVIDIA Hopper — but compute cost says nothing about the output that investment buys. An analysis of mere FLOPS per dollar suggests a 2x NVIDIA Blackwell advantage compared with the NVIDIA Hopper architecture. However, the actual outcome is orders of magnitude different: Blackwell delivers more than 50x greater token output per watt than Hopper, resulting in nearly 35x lower

Rethinking AI TCO: Why Cost per Token Is the Only Metric That Matters

What Is InferenceMAX v1 and Why Does It Matter for AI Economics?

InferenceMAX v1, a new benchmark from SemiAnalysis released Monday, is the latest to highlight Blackwell’s inference leadership. It runs popular models across leading platforms, measures performance for a wide range of use cases and publishes results anyone can verify. Why do benchmarks like this matter? Because modern AI isn’t just about raw speed — it’s about efficiency and economics at scale. As models shift from one-shot replies to multistep reasoning and tool use, they generate far more tokens per query, dramatically increasing compute demands. NVIDIA’s open-source collaborations with Ope

Telecommunications Archives

How Did NVIDIA Double Blackwell Performance Through Continuous Software Optimizations to Lower Token Cost?

NVIDIA doubled Blackwell performance through continuous software optimization, refining kernels, compiler paths, and inference runtimes so the same hardware delivers significantly more useful AI throughput over time. Initial gpt-oss-120b performance on an NVIDIA DGX Blackwell B200 system with the NVIDIA TensorRT LLM library was market-leading, but NVIDIA’s teams and the community have significantly optimized TensorRT LLM for open-source large language models. The TensorRT LLM v1.0 release is a major breakthrough in making large AI models faster and more responsive for everyone. Through advance

Telecommunications Archives

Videos

Rethinking AI TCO: Why Cost per Token Is the Only Metric That Matters

… In the generative and agentic AI era, these facilities have evolved into AI token factories. With AI inference becoming their primary workload, their primary output is intelligence manufactured in the form of tokens. …

Apr 15, 2026 · Shruti Koparkar

Telecommunications Archives

… The cost per token is crucial for evaluating AI model efficiency, directly impacting operational expenses. The NVIDIA Blackwell architecture lowered cost per million tokens by 15x versus the previous generation, leading to substantial savings and fostering wider AI deployment and innovation. …

May 7, 2026

AI quota inflation is no token effort. It's baked in

… On such a concept, the entire commercial edifice of AI hangs. Basing costs on token consumption, whether it's for code suggestion, generation, or AI debugging, makes as much sense – less, even – than paying programmers per keystroke in and character out. …

Apr 20, 2026 · Rupert Goodwins

You’re about to feel the AI money squeeze

… That may look like “thinking through” a lot of different potential paths, launching sub-agents to do portions of a task, or verifying the accuracy of different steps of the process. “You put in your one-sentence prompt… and it’ll talk out loud to itself for thousands and thousands of tokens, thousa… …

Apr 23, 2026 · Hayden Field

The Many Aspects of Inference Performance

… To illustrate the impact of software optimization on cost per token : since February, MI355X GPU cost per token has dropped significantly, while GB300 NVL72 remains higher and unchanged Figure 2 . Figure 2: Cost per million tokens over time, at interactivity 100 TPS/user -- DeepSeek R1, FP8, no MTP. …

May 11, 2026 · AMD AI Group

Nvidia Software Pushes MLPerf Inference Benchmarks To New Highs

… Salvatore echoed Huang’s argument made at GTC, that Nvidia’s platforms – while expensive – improves token generation. “Increases in token generation or increases in performance basically generate more revenue, they reduce costs, they get you more value from the same infrastructure,” he said. “This … …

Apr 2, 2026 · Jeff Burt

Unpacking the deceptively simple science of tokenomics

… On top of how efficient at churning out tokens Nvidia and AMD's AI accelerators are, InferenceX also tracks inference costs. The closer the Pareto curve gets to the bottom right corner, the better value those tokens are. …

Mar 7, 2026 · Tobias Mann

Discussions and forums

Hacker News · u/tinyopsstudio · 1d ago

Show HN: AI agent token cost calculator for Codex and Claude Code loops

Hacker News · u/Robelkidin · 3w ago

Show HN: Token Usage Meter 12 Providers and Coding Agent

Here once again A Token Usage Meter for 12+ AI Providers Anthropic, OpenAI, Google, Alibaba qween, Moonshot Kimi, MiniMax, ElevenLabs, Deepgram, Perplexity. Qlaud.ai provides token usage meter / AI billing layer. Also Ql…

r/selfhosted · u/narrow-adventure · 3w ago

MIT-licensed Sentry + Datadog replacement, self-hosts in ~90 seconds

Hi, I've been working on an open-source observability stack that is really easy to self host. About 6 months ago I got super frustrated by paying for Sentry and hosting a bunch of services (otel collector, prometheus, gr…

Hacker News · u/AdarshRao23 · 2w ago

Show HN: Torrix, self hosted, LLM Observability,(no Postgres, no Redis)

I work as a SAP Integration consultant and built this as a side project. Friction point: Most self hosted LLM observability tools require Postgres, Redis and non trivial infrastructure. Teams just want to see what their …

72 4

Hacker News · u/cinooo · 3w ago

What I changed in how I use Claude Code after Anthropic's postmortem

After watching Anthropic's recent postmortem (anthropic.com/engineering/april-23-postmortem), I've been thinking about the way I approach Claude Code differently. They lowered default reasoning effort to fix latency, cal…

7 3

Building for the Rising Complexity of Agentic Systems with Extreme Co-Design | NVIDIA Technical Blog

Agentic AI / Generative AI Building for the Rising Complexity of Agentic Systems with Extreme Co-Design May 05, 2026 By Eduardo Alvarez , Benjamin Klieger and Graham Steele Discuss 0 Discuss 0 L T F R E AI-Generated Summary Like Dislike Agentic AI architectures feature hierarchical agents and sub-a… …

May 5, 2026 · Eduardo Alvarez

What Are AI Tokens? The Language and Currency Powering Modern AI

… Training an AI model starts with the tokenization of the training dataset. Based on the size of the training data, the number of tokens can number in the billions or trillions — and, per the pretraining scaling law , the more tokens used for training, the better the quality of the AI model. …

Mar 17, 2025 · Dave Salvator

$200 'socketed' Nvidia AI GPU for servers hacked into a PCIe card with custom PCB and 3D-printed cooling — modded Tesla V100 SMX data center GPU runs AI LLMs and is more efficient than many modern midrange offerings in AI inference

… To make the comparison fair, the YouTuber also limited the 3060 to 100W; it ended up consuming 171W and producing just 68 tokens per second. So, with both new results, the V100 achieves an efficiency score of 0.55 tokens/s per watt, while the 3060 12 GB was stuck at 0.39 tokens/s per watt. …

May 10, 2026 · Hassam Nasir

Followed topics

Search

People also ask

Videos

Rethinking AI TCO: Why Cost per Token Is the Only Metric That Matters

Telecommunications Archives

AI quota inflation is no token effort. It's baked in

You’re about to feel the AI money squeeze

Top stories

Building Token‑Metered AI Services on Telco AI Factories | NVIDIA Technical Blog

Dell Launches Local ‘Deskside Agentic AI’ Workstations to Slash Cloud Token Costs

OpenClaw creator burned through $1.3 million in OpenAI API tokens in a single month — bill covered 603 billion tokens across 7.6 million requests and 100 coding agents

Paper page - Position: LLM Inference Should Be Evaluated as Energy-to-Token Production

The Many Aspects of Inference Performance

Nvidia Software Pushes MLPerf Inference Benchmarks To New Highs

Unpacking the deceptively simple science of tokenomics

Discussions and forums

Show HN: AI agent token cost calculator for Codex and Claude Code loops

Show HN: Token Usage Meter 12 Providers and Coding Agent

MIT-licensed Sentry + Datadog replacement, self-hosts in ~90 seconds

Show HN: Torrix, self hosted, LLM Observability,(no Postgres, no Redis)

What I changed in how I use Claude Code after Anthropic's postmortem

Building for the Rising Complexity of Agentic Systems with Extreme Co-Design | NVIDIA Technical Blog

What Are AI Tokens? The Language and Currency Powering Modern AI

$200 'socketed' Nvidia AI GPU for servers hacked into a PCIe card with custom PCB and 3D-printed cooling — modded Tesla V100 SMX data center GPU runs AI LLMs and is more efficient than many modern midrange offerings in AI inference