Paper page - Co-Evolving Policy Distillation
…AI-generated summary RLVR and OPD have become standard paradigms for post-training . We provide a unified analysis of these two paradigms in consolidating multiple expert capabilities into a single model, identifying…