Unweight: how we compressed an LLM 22% without sacrificing quality
… The same Huffman-compressed model bundle can serve both distribution and inference: For distribution , Huffman encoding maximizes compression ~22% total model size reduction , reducing transfer times when shipping models across the network. …