Search

Showing top 116 results for "AI token costs"

All sources blogs.nvidia.com 33 developer.nvidia.com 18 theregister.com 12 huggingface.co 9 pcworld.com 4 techcrunch.com 4 amd.com 3 nextplatform.com 3 tomshardware.com 2 engadget.com 2 press.asus.com 2 newsroom.intel.com 2

People also ask

What Are the Factors That Lower Token Cost?

Understanding how to optimize token cost requires looking at the equation for calculating cost per million tokens. In this equation, many enterprises evaluating AI infrastructure focus on the numerator: the cost per GPU per hour. For cloud deployments, this is the hourly rate paid to a cloud provider; for on-premises deployments, it’s the effective hourly cost derived from amortizing owned infrastructure. The real key to reducing token cost, however, lies in the denominator: maximizing the delivered token output. That denominator carries two business implications. Minimize token cost: When thi

Rethinking AI TCO: Why Cost per Token Is the Only Metric That Matters

Why Does Cost per Token Matter Much More Than FLOPS per Dollar?

The following data for the DeepSeek-R1 AI model demonstrates the difference between theoretical and actual business outcomes. Looking at compute cost alone, the NVIDIA Blackwell platform appears to cost roughly 2x more than NVIDIA Hopper — but compute cost says nothing about the output that investment buys. An analysis of mere FLOPS per dollar suggests a 2x NVIDIA Blackwell advantage compared with the NVIDIA Hopper architecture. However, the actual outcome is orders of magnitude different: Blackwell delivers more than 50x greater token output per watt than Hopper, resulting in nearly 35x lower

Rethinking AI TCO: Why Cost per Token Is the Only Metric That Matters

What Is InferenceMAX v1 and Why Does It Matter for AI Economics?

InferenceMAX v1, a new benchmark from SemiAnalysis released Monday, is the latest to highlight Blackwell’s inference leadership. It runs popular models across leading platforms, measures performance for a wide range of use cases and publishes results anyone can verify. Why do benchmarks like this matter? Because modern AI isn’t just about raw speed — it’s about efficiency and economics at scale. As models shift from one-shot replies to multistep reasoning and tool use, they generate far more tokens per query, dramatically increasing compute demands. NVIDIA’s open-source collaborations with Ope

Telecommunications Archives

How Did NVIDIA Double Blackwell Performance Through Continuous Software Optimizations to Lower Token Cost?

NVIDIA doubled Blackwell performance through continuous software optimization, refining kernels, compiler paths, and inference runtimes so the same hardware delivers significantly more useful AI throughput over time. Initial gpt-oss-120b performance on an NVIDIA DGX Blackwell B200 system with the NVIDIA TensorRT LLM library was market-leading, but NVIDIA’s teams and the community have significantly optimized TensorRT LLM for open-source large language models. The TensorRT LLM v1.0 release is a major breakthrough in making large AI models faster and more responsive for everyone. Through advance

Telecommunications Archives

Videos

Vibe coding tool GitHub Copilot is moving to usage-based billing

…or stores). Each token is priced based on the model used, and the total is converted into AI credits, where 1 AI credit = $0.01 USD. The cost of an interaction depends…

Apr 28, 2026 · Chris Kerr

LLM Inference Benchmarking: How Much Does Your LLM Inference Cost? | NVIDIA Technical Blog

…Learn how to lower your cost per token and maximize AI models with The IT Leader’s Guide to AI Inference and Performance . Learn more about how to calculate the lowest cost…

Jun 18, 2025 · Vinh Nguyen

Startups Brag They Spend More Money on AI Than Human Employees

…Startup CEOs who are “tokenmaxxing” are bragging that they are spending more money on AI compute than it would cost to hire human workers. Astronomical AI bills are now, in a certain…

Apr 22, 2026 · Jason Koebler

DeepSeek permanently reduces the price of its flagship V4 model by 75 percent - Engadget

…cost savings for enterprise accounts or power users who go through millions of tokens in a day. The major price drop also presents a more affordable alternative to other popular AI models…

May 23, 2026 · Jackson Chen

Anthropic admits Claude Code quotas running out too fast

…Bugs aside, what we are seeing is an implicit negotiation between users and providers over what is an acceptable pricing and usage model for AI development. Users want to control costs and…

Mar 31, 2026 · Tim Anderson

How AI Factories Generate Revenue: A Guide to Optimized Inference Economics

…The primary product is intelligence, how efficiently the AI factory can produce the lowest cost per token, which drives decisions, automation and new AI solutions. AI is creating value for everyone — from…

May 15, 2025 · Kyle Aubrey

Microsoft's GitHub suspends Copilot account sign-ups

…study finds flattery will get AI everywhere AI is reshaping Britain's datacenter map away from London Now, as part of GitHub's cost cutting and service realignment, Binder said the operation…

Apr 20, 2026 · Thomas Claburn

Discussions and forums

Hacker News · u/tinyopsstudio · 2d ago

Show HN: AI agent token cost calculator for Codex and Claude Code loops

Hacker News · u/Robelkidin · 3w ago

Show HN: Token Usage Meter 12 Providers and Coding Agent

Here once again A Token Usage Meter for 12+ AI Providers Anthropic, OpenAI, Google, Alibaba qween, Moonshot Kimi, MiniMax, ElevenLabs, Deepgram, Perplexity. Qlaud.ai provides token usage meter / AI billing layer. Also Ql…

r/openai · u/VegetablePen4755 · 4d ago

DeepSeek just popped the American AI bubble.

DeepSeek just popped the American AI bubble. Not by killing AI. By killing the fantasy of unlimited AI pricing power. DeepSeek V4 Pro: Input: $0.435 per 1M tokens Output: $0.87 per 1M tokens OpenAI GPT-5.5: Input: $5.00 …

Hacker News · u/AdarshRao23 · 2w ago

Show HN: Torrix, self hosted, LLM Observability,(no Postgres, no Redis)

I work as a SAP Integration consultant and built this as a side project. Friction point: Most self hosted LLM observability tools require Postgres, Redis and non trivial infrastructure. Teams just want to see what their …

72 4

Hacker News · u/cinooo · 4w ago

Followed topics

Search

People also ask

Videos

Vibe coding tool GitHub Copilot is moving to usage-based billing

LLM Inference Benchmarking: How Much Does Your LLM Inference Cost? | NVIDIA Technical Blog

Startups Brag They Spend More Money on AI Than Human Employees

DeepSeek permanently reduces the price of its flagship V4 model by 75 percent - Engadget

Top stories

ASUS Takes the Lead in Hybrid Agentic AI Infrastructure- Maximizing Performance While Reducing Inference Costs

Building Token‑Metered AI Services on Telco AI Factories | NVIDIA Technical Blog

Dell Launches Local ‘Deskside Agentic AI’ Workstations to Slash Cloud Token Costs

OpenClaw creator burned through $1.3 million in OpenAI API tokens in a single month — bill covered 603 billion tokens across 7.6 million requests and 100 coding agents

Anthropic admits Claude Code quotas running out too fast

How AI Factories Generate Revenue: A Guide to Optimized Inference Economics

Microsoft's GitHub suspends Copilot account sign-ups

Discussions and forums

Show HN: AI agent token cost calculator for Codex and Claude Code loops

Show HN: Token Usage Meter 12 Providers and Coding Agent

DeepSeek just popped the American AI bubble.

Show HN: Torrix, self hosted, LLM Observability,(no Postgres, no Redis)

What I changed in how I use Claude Code after Anthropic's postmortem

NVIDIA Platform Delivers Lowest Token Cost Enabled by Extreme Co-Design | NVIDIA Technical Blog

Anthropic tweaks Claude usage limits to manage capacity

GitHub Copilot's price shakeup could end cheap AI coding as we know it