Search

Showing top 12 results for "AI models announced"

Filtered by topic: NVIDIA Clear ✕

People also ask

Why the rush?

One of the defining characteristics of SRAM-heavy architectures from Groq and its rival Cerebras is that they are very fast when running LLM inferencing workloads, routinely achieving generation rates exceeding 500 and even 1000 tokens a second. The faster Nvidia can generate tokens, the faster code assistants and AI agents can act. But this kind of speed also opens the door to what Huang describes as test-time scaling. The idea is that by letting "reasoning" models generate more "thinking" tokens, they can produce smarter, more accurate results. So, the faster you can generate tokens, the les

A closer look at Nvidia's Groq-powered LPX rack systems
Who is LPX for?

If you're not a hyperscaler, neocloud, model dev, LPX is probably not for you. The sheer number of LPUs required to serve large open models will likely put Nvidia's LPX platform out of reach for most enterprises. Speaking to press ahead of this week's keynote, Buck said Nvidia is focusing primarily on model builders and service providers that need to serve trillion-plus-parameter models with token rates exceeding 500 to 1,000 a second. Having said that, in a technical blog, Nvidia presented another use case for the LPUs as a speculative decode accelerator, something we suggested the company mi

A closer look at Nvidia's Groq-powered LPX rack systems