Paper page - Unmasking On-Policy Distillation: Where It Helps, Where It Hurts, and Why
…AI-generated summary On-policy distillation offers dense, per-token supervision for training reasoning models; however, it remains unclear under which conditions this signal is beneficial and under which it is detrimental…