Paper page - LEAD: Length-Efficient Adaptive and Dynamic Reasoning for Large Language Models
…Furthermore, it estimates an adaptive per-problem target length online based on the model's own correct rollouts, applying a symmetric efficiency reward that penalizes both overthinking and over-compression. Evaluated on…