Paper page - SlimSpec: Low-Rank Draft LM-Head for Accelerated Speculative Decoding
… SlimSpec achieves 4-5times acceleration over the standard LM-head architecture while maintaining a competitive acceptance length, surpassing existing methods by up to 8-9% of the end-to-end speedup . Our method requires minimal adjustments of training and inference pipelines. …