Search: model rollout

Paper page - Towards On-Policy Data Evolution for Visual-Native Multimodal Deep Search Agents

…On top of this harness, On-policy Data Evolution (ODE) runs a closed-loop data generator that refines itself across rounds from rollouts of the policy being trained. This per-round refinement…

May 13, 2026

Paper page - Draft-OPD: On-Policy Distillation for Speculative Draft Models

…assisted rollouts and error replay. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Speculative decoding accelerates large language model inference by pairing a target model with a lightweight draft model whose proposed…

Jun 2, 2026

Paper page - Agentic AI Systems Should Be Designed as Marginal Token Allocators

…a router that decides which model answers, an agent that decides whether to plan, act, verify, or defer, a serving stack that decides how to produce each token, and a training pipeline…

May 5, 2026

Paper page - ReflectDrive-2: Reinforcement-Learning-Aligned Self-Editing for Discrete Diffusion Driving

…and lateral heading directions and supervise the model to recover the original expert trajectory. We then fine-tune the full decision--draft--reflect rollout with reinforcement learning (RL), assigning terminal driving reward…

May 8, 2026

Paper page - A^2TGPO: Agentic Turn-Group Policy Optimization with Adaptive Turn-level Clipping

…Existing approaches to such process credit assignment either depend on separate external process reward models that introduce additional consumption, or tree-based structural rollout that merely redistributes the outcome signal while constraining…

May 8, 2026

Paper page - minWM: A Full-Stack Open-Source Framework for Real-Time Interactive Video World Models

…Interactive world models require controllable, causal, and low-latency rollout , which in practice demands a full pipeline spanning data construction, controllable fine-tuning, autoregressive training, few-step distillation, and streaming inference. In…

May 29, 2026

Followed topics

Search

Paper page - Towards On-Policy Data Evolution for Visual-Native Multimodal Deep Search Agents

Paper page - Draft-OPD: On-Policy Distillation for Speculative Draft Models

Paper page - Agentic AI Systems Should Be Designed as Marginal Token Allocators

Paper page - ReflectDrive-2: Reinforcement-Learning-Aligned Self-Editing for Discrete Diffusion Driving

Paper page - A^2TGPO: Agentic Turn-Group Policy Optimization with Adaptive Turn-level Clipping

Paper page - minWM: A Full-Stack Open-Source Framework for Real-Time Interactive Video World Models

Paper page - TMAS: Scaling Test-Time Compute via Multi-Agent Synergy

Paper page - Rebellious Student: Reversing Teacher Signals for Reasoning Exploration with Self-Distilled RLVR

Paper page - RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO

Paper page - Boosting Reinforcement Learning with Verifiable Rewards via Randomly Selected Few-Shot Guidance