Search

Showing top 4 results for "AI token cost pressure"

People also ask

Why the rush?

One of the defining characteristics of SRAM-heavy architectures from Groq and its rival Cerebras is that they are very fast when running LLM inferencing workloads, routinely achieving generation rates exceeding 500 and even 1000 tokens a second. The faster Nvidia can generate tokens, the faster code assistants and AI agents can act. But this kind of speed also opens the door to what Huang describes as test-time scaling. The idea is that by letting "reasoning" models generate more "thinking" tokens, they can produce smarter, more accurate results. So, the faster you can generate tokens, the les

A closer look at Nvidia's Groq-powered LPX rack systems

Who is LPX for?

If you're not a hyperscaler, neocloud, model dev, LPX is probably not for you. The sheer number of LPUs required to serve large open models will likely put Nvidia's LPX platform out of reach for most enterprises. Speaking to press ahead of this week's keynote, Buck said Nvidia is focusing primarily on model builders and service providers that need to serve trillion-plus-parameter models with token rates exceeding 500 to 1,000 a second. Having said that, in a technical blog, Nvidia presented another use case for the LPUs as a speculative decode accelerator, something we suggested the company mi

A closer look at Nvidia's Groq-powered LPX rack systems

What happened to Rubin CPX?

You may be scratching your head, wondering "wasn't there supposed to be some kind of special Rubin chip optimized for large-context prefill processing?" You're not hallucinating. Back at Computex last northern spring, Nvidia unveiled the Rubin CPX, a version of Rubin that used slower, less expensive GDDR7 memory to speed up the time to first token – how long users or agents have to wait for the model to start generating an output – when working with large inputs. The idea was that Rubin CPX could cut down on wait times for applications that might involve processing large quantities of document

A closer look at Nvidia's Groq-powered LPX rack systems

Microsoft's GitHub suspends Copilot account sign-ups

… MORE CONTEXT NASA working on 'Big Bang' upgrade to keep the Voyagers alive for longer Claude Desktop changes app access settings for browsers you don't even have installed yet Schmoozebots: study finds flattery will get AI everywhere AI is reshaping Britain's datacenter map away from London Now, as… …

Apr 20, 2026 · Thomas Claburn

AI still doesn't work very well in business, reckoning soon

… And that accelerated collapse is then going to cost a lot of people their jobs." Another likely outcome, said Smiley, is pricing pressure – companies will ask for discounts when they know a service company is using AI tools. Deeks said extreme pricing pressure is starting to surface. …

Mar 17, 2026 · Thomas Claburn

A closer look at Nvidia's Groq-powered LPX rack systems

… Nvidia is also under some pressure to maintain its dominance of the AI infrastructure market as rival chip designers like AMD close the gap on hardware and software. …

Mar 19, 2026 · Tobias Mann

Cheap Chinese models are overtaking Anthropic

… Against this backdrop, recent cost-saving moves designed to reduce token demand during peak hours fail to inspire optimism. But there's a more fundamental risk – remaining relevant in the face of increasingly capable competition from China. …

Mar 28, 2026 · Thomas Claburn

Followed topics

People also ask

Microsoft's GitHub suspends Copilot account sign-ups

AI still doesn't work very well in business, reckoning soon

A closer look at Nvidia's Groq-powered LPX rack systems

Cheap Chinese models are overtaking Anthropic