Paper page - Self-Distilled Agentic Reinforcement Learning
…Beyond KL Matching via Reward Regularization (2026) Self-Distilled RLVR (2026) SCOPE: Signal-Calibrated On-Policy Distillation Enhancement with Dual-Path Adaptive Weighting (2026) Asymmetric On-Policy Distillation: Bridging Exploitation and Imitation…
