TurboQuant tackles the hidden memory problem that's been limiting your local LLMs
…During inference, every attention layer stores a key and value vector for each token it processes so it doesn't have to recompute them on future tokens. The memory required follows a…