I tried a new 8B local LLM, and its design might be the biggest shift since DeepSeek R1
…Plus, because CCA compresses parameters, cache, and FLOPs together by the same factor, the user can dial the compression toward either memory or compute, depending on what their hardware is short on…
