Paper page - Towards On-Policy Data Evolution for Visual-Native Multimodal Deep Search Agents
…On top of this harness, On-policy Data Evolution (ODE) runs a closed-loop data generator that refines itself across rounds from rollouts of the policy being trained. This per-round refinement…