Paper page - KL for a KL: On-Policy Distillation with Control Variate Baseline
… The following papers were recommended by the Semantic Scholar API Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes 2026 Hybrid Policy Distillation for LLMs 2026 Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation 2026 A Su… …