Paper page - LEAD: Length-Efficient Adaptive and Dynamic Reasoning for Large Language Models
…directing optimization capacity to the most informative learning signal. Furthermore, it estimates an adaptive per-problem target length online based on the model's own correct rollouts, applying a symmetric efficiency reward…