Paper page - SlimSpec: Low-Rank Draft LM-Head for Accelerated Speculative Decoding
…language model head while maintaining full vocabulary support and achieving significant speedup with minimal pipeline changes. AI-generated summary Speculative decoding speeds up autoregressive generation in Large Language Models (LLMs) through a…