Search: model rollout

Paper page - Reinforcing Multimodal Reasoning Against Visual Degradation

…AI-generated summary Reinforcement Learning has significantly advanced the reasoning capabilities of Multimodal Large Language Models (MLLMs), yet the resulting policies remain brittle against real-world visual degradation s such as blur…

May 12, 2026

Paper page - ClawGym: A Scalable Framework for Building Effective Claw Agents

…We then train a family of capable Claw-style models, termed ClawGym-Agents , through supervised fine-tuning on black-box rollout trajectories , and further explore reinforcement learning via a lightweight pipeline that…

Apr 30, 2026

Paper page - ESPO: Early-Stopping Proximal Policy Optimization

…rollout tokens cumulatively. View arXiv page View PDF Add to collection Community ESPO (Early-Stopping Proximal Policy Optimization) tackles a key waste in RL training of reasoning LLMs: when a model errs…

Jun 2, 2026

Paper page - AsyncWebRL: Efficient Multi-Step RL for Visual Web Agents

…On the system side, an asynchronous design overlaps rollout , gradient update , and policy refresh across iterations, paired with two web-agent-specific adaptations, namely an everlasting rollout pool and lightweight screenshot handling…

Jun 9, 2026

Paper page - GE-Sim 2.0: A Roadmap Towards Comprehensive Closed-loop Video World Simulators for Robotic Manipulation

…GE-Sim 2.0 tops the public WorldArena leaderboard at only 2B parameters, outperforming both dedicated robotic world models and closed-source general video generators, and policies trained against its rollouts and…

May 28, 2026

Paper page - StraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstraction

…GRPO-style rollout, enhanced by diverse strategy sampling and critical self-judgment. Does it work? StraTA improves both sample efficiency and final performance, outperforming both frontier closed-source models and prior RL…

May 8, 2026

Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective

…Could you share learning curves or metrics for the 120B model? The article mentions both vLLM and SGLang. Which engine is used for rollouts in the final experiments? If different engines are…

Jan 27, 2026

Paper page - Unmasking On-Policy Distillation: Where It Helps, Where It Hurts, and Why

…Across a range of self-distillation settings and external teacher model s, we observe that distillation guidance exhibits substantially higher alignment with the ideal on incorrect rollouts than on correct ones, where…

May 12, 2026

Paper page - The DAWN of World-Action Interactive Models

Papers arxiv:2605.11550 The DAWN of World-Action Interactive Models Published on May 12 Submitted by LiangYao on May 14 COWARobot Authors: , Liang Yao , , , , , , , Abstract World-Action Interactive Models (WAIMs) jointly…

May 14, 2026

Paper page - Towards On-Policy Data Evolution for Visual-Native Multimodal Deep Search Agents

…On top of this harness, On-policy Data Evolution (ODE) runs a closed-loop data generator that refines itself across rounds from rollouts of the policy being trained. This per-round refinement…

May 13, 2026

Followed topics

Search