Search

Showing top 4 results for "AI token cost pressure"

People also ask

Why the rush?

One of the defining characteristics of SRAM-heavy architectures from Groq and its rival Cerebras is that they are very fast when running LLM inferencing workloads, routinely achieving generation rates exceeding 500 and even 1000 tokens a second. The faster Nvidia can generate tokens, the faster code assistants and AI agents can act. But this kind of speed also opens the door to what Huang describes as test-time scaling. The idea is that by letting "reasoning" models generate more "thinking" tokens, they can produce smarter, more accurate results. So, the faster you can generate tokens, the les

A closer look at Nvidia's Groq-powered LPX rack systems
Who is LPX for?

If you're not a hyperscaler, neocloud, model dev, LPX is probably not for you. The sheer number of LPUs required to serve large open models will likely put Nvidia's LPX platform out of reach for most enterprises. Speaking to press ahead of this week's keynote, Buck said Nvidia is focusing primarily on model builders and service providers that need to serve trillion-plus-parameter models with token rates exceeding 500 to 1,000 a second. Having said that, in a technical blog, Nvidia presented another use case for the LPUs as a speculative decode accelerator, something we suggested the company mi

A closer look at Nvidia's Groq-powered LPX rack systems
What happened to Rubin CPX?

You may be scratching your head, wondering "wasn't there supposed to be some kind of special Rubin chip optimized for large-context prefill processing?" You're not hallucinating. Back at Computex last northern spring, Nvidia unveiled the Rubin CPX, a version of Rubin that used slower, less expensive GDDR7 memory to speed up the time to first token – how long users or agents have to wait for the model to start generating an output – when working with large inputs. The idea was that Rubin CPX could cut down on wait times for applications that might involve processing large quantities of document

A closer look at Nvidia's Groq-powered LPX rack systems