Search

Showing top 7 results for "Cerebras"

Related topics: Cerebras

Tracked topic

Cerebras

9 articles indexed Last updated 5d ago See topic hub

People also ask

Why the rush?

One of the defining characteristics of SRAM-heavy architectures from Groq and its rival Cerebras is that they are very fast when running LLM inferencing workloads, routinely achieving generation rates exceeding 500 and even 1000 tokens a second. The faster Nvidia can generate tokens, the faster code assistants and AI agents can act. But this kind of speed also opens the door to what Huang describes as test-time scaling. The idea is that by letting "reasoning" models generate more "thinking" tokens, they can produce smarter, more accurate results. So, the faster you can generate tokens, the les

A closer look at Nvidia's Groq-powered LPX rack systems