Paper page - Co-Evolving Policy Distillation
…Evolving Policy Distillation enables unified integration of multiple expert capabilities through parallel training and bidirectional policy distillation, outperforming existing methods in multi-modal reasoning tasks. AI-generated summary RLVR and OPD have…
