TurboQuant is a big deal, but it won’t end the memory crunch
… These quantization methods also tend to introduce their own performance overheads. This is really where TurboQuant's innovations lie. Google claims that it can achieve quality similar to BF16 using just 3.5 bits , while also mitigating those pesky overheads. …