Search

Showing top 115 results for "AI cost and tokens"

All sources blogs.nvidia.com 19 wccftech.com 16 techcrunch.com 10 developer.nvidia.com 10 tomshardware.com 9 theregister.com 8 huggingface.co 6 amd.com 5 theverge.com 2 androidauthority.com 2 engadget.com 2 pcworld.com 2

People also ask

What Are the Factors That Lower Token Cost?

Understanding how to optimize token cost requires looking at the equation for calculating cost per million tokens. In this equation, many enterprises evaluating AI infrastructure focus on the numerator: the cost per GPU per hour. For cloud deployments, this is the hourly rate paid to a cloud provider; for on-premises deployments, it’s the effective hourly cost derived from amortizing owned infrastructure. The real key to reducing token cost, however, lies in the denominator: maximizing the delivered token output. That denominator carries two business implications. Minimize token cost: When thi

Rethinking AI TCO: Why Cost per Token Is the Only Metric That Matters

Why Does Cost per Token Matter Much More Than FLOPS per Dollar?

The following data for the DeepSeek-R1 AI model demonstrates the difference between theoretical and actual business outcomes. Looking at compute cost alone, the NVIDIA Blackwell platform appears to cost roughly 2x more than NVIDIA Hopper — but compute cost says nothing about the output that investment buys. An analysis of mere FLOPS per dollar suggests a 2x NVIDIA Blackwell advantage compared with the NVIDIA Hopper architecture. However, the actual outcome is orders of magnitude different: Blackwell delivers more than 50x greater token output per watt than Hopper, resulting in nearly 35x lower

Rethinking AI TCO: Why Cost per Token Is the Only Metric That Matters

Videos

Anthropic tweaks Claude usage limits to manage capacity

…secure Anthropic sells its AI services in two forms: an API and subscriptions. API customers pay a published rate for various forms of token usage – Base Input Tokens, 5m Cache Writes, 1h…

Mar 26, 2026 · Thomas Claburn

The agentic AI boom is here; operations will decide who wins

…a secure, consistent front door to both private and public models. With centralized authentication, observability, token-based governance, and granular cost controls, enterprises gain transparency and predictability in AI consumption. Support for…

Mar 18, 2026 · Tuhina Goel, director product marketing, AI at Nutanix

DeepSeek Aims At Memory Shortage With Latest AI Model But Might Sacrifice Performance

…DeepSeek claims that the V4 AI model requires just 27% single-token inference FLOPs and 10% of key-value (KV) cache when compared to its predecessor, the DeepSeek V3.2 model . The…

Apr 24, 2026 · Ramish Zafar

Agent Computers: Pay Once for Cloud-Grade Intelligence

…tokens and $15 per million output tokens. That pricing model works well for flexible, on-demand access to frontier AI, but it also means that sustained agent workloads can become costly as…

May 27, 2026 · AMD AI Group

Claude AI: What's free in 2026 and what isn't? - Engadget

…every question has its own unique compute cost. It's also for that reason that Anthropic recommends you keep your prompts concise and clear. And please, don't waste tokens thanking Claude…

Jun 3, 2026 · Igor Bonifacic

Accelerate Token Production in AI Factories Using Unified Services and Real-Time AI | NVIDIA Technical Blog

…to stranded power and reduced tokens per watt, silently eroding factory output at scale. As AI factories scale to thousands of GPUs running diverse mission critical workloads, the cost of unpredictable congestion…

Apr 1, 2026 · Pradyumna Desale

'The math doesn't work': Why your $200 AI subscription is secretly worth thousands

…For instance, a quick question and a short answer might be a few hundred tokens. But the way people increasingly use these tools is far more demanding. When you hand an AI…

Jun 17, 2026 · Amanda Caswell

Discussions and forums

Hacker News · u/tinyopsstudio · May 26, 2026

Followed topics

Search

People also ask

Videos

Anthropic tweaks Claude usage limits to manage capacity

The agentic AI boom is here; operations will decide who wins

DeepSeek Aims At Memory Shortage With Latest AI Model But Might Sacrifice Performance

Agent Computers: Pay Once for Cloud-Grade Intelligence

Top stories

Ditching the cloud for local AI — how I use two mini PCs to process millions of tokens a day and save money on costly API fees

OpenAI debuts Jalapeño, its first custom AI chip to cut ChatGPT costs and reduce Nvidia dependency

Microsoft Risks Trump's Ire By Abandoning The Costly OpenAI And Anthropic Models For China-Based DeepSeek's V4 Model For Enterprise Workloads

Tensordyne's 3nm Napier AI Chip Promises 13x Higher Token Throughput Than Blackwell & Blazes Past Rubin With 1000 Tokens/s In Multi-Trillion Parameter Models

Claude AI: What's free in 2026 and what isn't? - Engadget

Accelerate Token Production in AI Factories Using Unified Services and Real-Time AI | NVIDIA Technical Blog

'The math doesn't work': Why your $200 AI subscription is secretly worth thousands

Discussions and forums

Show HN: AI agent token cost calculator for Codex and Claude Code loops

Value for Money Is All You Need

Zai replaced the network architecture running GLM-5.1 inference and the gains are pretty wild

Show HN: Token Usage Meter 12 Providers and Coding Agent

Show HN: Open-source CLI to see your AI coding token usage and compare it

Building the AI Grid with NVIDIA: Orchestrating Intelligence Everywhere | NVIDIA Technical Blog

Gemini 3.5 Flash lands on Google's Android coding rankings, but it's 3x the cost for slower performance

Paper page - The First Token Knows: Single-Decode Confidence for Hallucination Detection