Paper page - Rubric-based On-policy Distillation
…AI-generated summary On-policy distillation (OPD) is a powerful paradigm for model alignment , yet its reliance on teacher logits restricts its application to white-box scenarios. We contend that structured semantic…