Paper page - The Extrapolation Cliff in On-Policy Distillation of Near-Deterministic Structured Outputs
Papers arxiv:2605.08737 The Extrapolation Cliff in On-Policy Distillation of Near-Deterministic Structured Outputs Published on May 9 Submitted by XinLi on May 14 Nanyang Technological University Authors: Xin Li , , , , Abstract On-policy distillation with reward extrapolation exhibits a safety thr… …