Google's TurboQuant cuts AI working memory by 6x, but it won't fix the global RAM shortage
…including LongBench, Needle in a Haystack, ZeroSCROLLS, RULER, and L-Eval, using the open-source Gemma and Mistral LLMs. The results show that TurboQuant could make AI cheaper to run, reducing its…
