Nvidia slaps Groq into new LPX racks for faster AI response
… OpenAI is using Cerebras' dinner-plate sized accelerators to achieve near nearly instantaneous code generation for models like GPT-5.3 Codex-Spark . …
Tracked topic
One of the defining characteristics of SRAM-heavy architectures from Groq and its rival Cerebras is that they are very fast when running LLM inferencing workloads, routinely achieving generation rates exceeding 500 and even 1000 tokens a second. The faster Nvidia can generate tokens, the faster code assistants and AI agents can act. But this kind of speed also opens the door to what Huang describes as test-time scaling. The idea is that by letting "reasoning" models generate more "thinking" tokens, they can produce smarter, more accurate results. So, the faster you can generate tokens, the les
A closer look at Nvidia's Groq-powered LPX rack systems… OpenAI is using Cerebras' dinner-plate sized accelerators to achieve near nearly instantaneous code generation for models like GPT-5.3 Codex-Spark . …
… In fact, this capability is how Cerebras won OpenAI's business earlier this year to power its Codex model . Nvidia didn’t own anything to match Cerebras until it acquired Groq's intellectual property and talent for a staggering $20 billion in December. …
… To meet that demand for its software and LLMs, OpenAI is working with an "infrastructure portfolio across multiple cloud partners and multiple chip platforms." OpenAI's infrastructure partners include Microsoft, Oracle, AWS, CoreWeave, and Google Cloud, while its chip suppliers include Nvidia, AMD,… …
… Last week, Amazon and Cerebras announced a collaboration to pair AWS' Trainium-3 accelerators with the latter's wafer-scale accelerators for many of the same reasons Nvidia built LPX. …
… OpenAI, SAP, Cerebras, Cloudflare, F5, SK Telecom, and Rebellions are also listed as early customers. …
… MORE CONTEXT Cerebras plans humongous AI supercomputer in India backed by UAE Oracle AI sailed the world on Royal Navy flagship via cloud-at-the-edge kit AI hasn't delivered the profits it was hyped for, says Deloitte TSMC sees no signs of the AI boom slowing for at least two or three years The new… …
… Some inference providers, like Cerebras, have leaned into their unique hardware architecture to provide "premium" low-latency tokens. …