Search

Showing top 114 results for "AI cost and tokens"

All sources blogs.nvidia.com 19 wccftech.com 16 developer.nvidia.com 10 tomshardware.com 9 techcrunch.com 9 theregister.com 8 huggingface.co 6 amd.com 5 theverge.com 2 androidauthority.com 2 engadget.com 2 pcworld.com 2

People also ask

What Are the Factors That Lower Token Cost?

Understanding how to optimize token cost requires looking at the equation for calculating cost per million tokens. In this equation, many enterprises evaluating AI infrastructure focus on the numerator: the cost per GPU per hour. For cloud deployments, this is the hourly rate paid to a cloud provider; for on-premises deployments, it’s the effective hourly cost derived from amortizing owned infrastructure. The real key to reducing token cost, however, lies in the denominator: maximizing the delivered token output. That denominator carries two business implications. Minimize token cost: When thi

Rethinking AI TCO: Why Cost per Token Is the Only Metric That Matters

Why Does Cost per Token Matter Much More Than FLOPS per Dollar?

The following data for the DeepSeek-R1 AI model demonstrates the difference between theoretical and actual business outcomes. Looking at compute cost alone, the NVIDIA Blackwell platform appears to cost roughly 2x more than NVIDIA Hopper — but compute cost says nothing about the output that investment buys. An analysis of mere FLOPS per dollar suggests a 2x NVIDIA Blackwell advantage compared with the NVIDIA Hopper architecture. However, the actual outcome is orders of magnitude different: Blackwell delivers more than 50x greater token output per watt than Hopper, resulting in nearly 35x lower

Rethinking AI TCO: Why Cost per Token Is the Only Metric That Matters

Videos

Claude users are teaching it to talk like a caveman. Here's why

It's no secret that Claude gobbles up tokens like a Corvette guzzles gas—and just like gas, tokens cost money. That's why the heaviest Claude users are always looking for…

Apr 15, 2026 · By Ben Patterson

Anthropic: Claude quota drain not caused by cache tweaks

…Writing to the five-minute cache costs 25 percent more in tokens, and writing to the one-hour cache 100 percent more, but reading from cache is around 10 percent of the…

Apr 13, 2026 · Tim Anderson

Inference Performance for Data Center Deep Learning

…Metrics such as tokens per watt, cost per million tokens, and tokens per second per user are crucial alongside throughput. For power-limited AI factories, NVIDIA's continuous software improvements translate into…

Google just tested a bunch of new AI models for Android app coding – here are the rankings

…But this latest update also puts things into perspective much better, as Google now shows the average latency, total tokens used, and the average cost of using each AI model. Google details…

May 21, 2026 · Ben Schoon

Ramp raises $750M at $44B valuation as investors hunger for fintechs with an AI story | TechCrunch

…AI token usage and costs have lately come into focus as companies look for ROI in AI and control expenditures from AI usage. Uber recently set a cap of $1,500 per…

Jun 4, 2026 · Ram Iyer

Goldman Sachs Bets Big On Agentic AI Boom By 2040, But Warns Bad Data Could Leave A Bad Taste

…As for the costs, the bank believes that the latest chips from NVIDIA and AMD, as well as those such as Trainium, the costs per token computation are dropping by as much…

May 9, 2026 · Ramish Zafar

Sam Altman makes 'mic drop' offer to every Y Combinator startup | TechCrunch

…The pro-deal folks believe the deal helps startups eliminate one of their biggest costs — AI infrastructure bills, which can spiral fast and consume a disproportionate share of an early-stage startup…

May 20, 2026 · Julie Bort

Discussions and forums

Hacker News · u/tinyopsstudio · May 26, 2026

Show HN: AI agent token cost calculator for Codex and Claude Code loops

Hacker News · u/BEKOUTI · 3d ago

Value for Money Is All You Need

Value For Money is All You NeedA reflection on the future of token consumption in artificial intelligenceToken consumption now sits at the center of the growing use of artificial intelligence by businesses and individual…

r/LocalLLaMA · u/Scared-Biscotti2287 · 4w ago

Zai replaced the network architecture running GLM-5.1 inference and the gains are pretty wild

Been following the infrastructure side of AI more lately and stumbled on this from Zai. They upgraded the network architecture on a thousand-GPU cluster running GLM-5.1 coding inference from the standard ROFT setup to so…

Hacker News · u/Robelkidin · May 5, 2026

Show HN: Token Usage Meter 12 Providers and Coding Agent

Here once again A Token Usage Meter for 12+ AI Providers Anthropic, OpenAI, Google, Alibaba qween, Moonshot Kimi, MiniMax, ElevenLabs, Deepgram, Perplexity. Qlaud.ai provides token usage meter / AI billing layer. Also Ql…

Hacker News · u/arhaam · 1w ago

Show HN: Open-source CLI to see your AI coding token usage and compare it

I use Claude Code, Codex, Cursor every day and had no idea how much I was actually burning across all of them combined. Each tool shows its own usage (most don't) in its own place, if at all, and I just wanted one number…

LLM Inference Benchmarking: How Much Does Your LLM Inference Cost? | NVIDIA Technical Blog

…Learn how to lower your cost per token and maximize AI models with The IT Leader’s Guide to AI Inference and Performance . Learn more about how to calculate the lowest cost…

Jun 18, 2025 · Vinh Nguyen

NVIDIA Delivers Day-1 Support For DeepMind's DiffusionGemma Open Model Across RTX & DGX Platforms, 150 Tokens/s With DGX Spark

…hardware. Open and local: DiffusionGemma is open-weight under a permissive Apache 2.0 license and runs entirely on RTX and DGX Spark — no cloud, no per-token cost — with day-zero…

Jun 10, 2026 · Hassan Mujtaba

Customers revolt as GitHub Copilot 'fixes' rate limits

AI + ML Customers revolt as GitHub Copilot 'fixes' rate limits Repair of bug that undercounted token usage leads to rapid exhaustion of subscription allowance Microsoft's GitHub last week told Copilot customers…

Apr 15, 2026 · Thomas Claburn

Followed topics