Search: AI model rollout

Paper page - Beyond Reasoning: Reinforcement Learning Unlocks Parametric Knowledge in LLMs

…learning improves large language model recall of parametric knowledge by redistributing probability mass toward correct answers, with gains driven primarily by reinforcing rare but learnable examples. AI-generated summary Reinforcement learning (RL…

May 13, 2026

Paper page - Learning while Deploying: Fleet-Scale Reinforcement Learning for Generalist Robot Policies

…Starting from a pretrained VLA policy, LWD closes the loop between deployment, shared physical experience, policy improvement , and redeployment by using autonomous rollouts and human interventions collected across a robot fleet. To…

May 4, 2026

Paper page - CausalCine: Real-Time Autoregressive Generation for Multi-Shot Video Narratives

…multi-shot video generation by addressing limitations of autoregressive models through causal modeling, dynamic memory routing, and real-time distillation techniques. AI-generated summary Autoregressive video generation aims at real-time, open…

May 13, 2026

Paper page - Fast-dDrive: Efficient Block-Diffusion VLM for Autonomous Driving

…AI-generated summary End-to-end autonomous driving via Vision-Language-Action (VLA) models demands a precarious balance between high-fidelity trajectory planning and efficient inference. Existing paradigms typically fall short: autoregressive…

May 28, 2026

Paper page - Learning Agentic Policy from Action Guidance

…Yuxiang Ji , , , , , , , , Abstract Agentic reinforcement learning for large language models leverages action data from human interactions as reference guidance to improve exploration and reduce dependence on costly supervised fine-tuning. AI-generated…

May 14, 2026

Paper page - IntentVLA: Short-Horizon Intent Modeling for Aliased Robot Manipulation

…An Advanced World Action Model for Robot Control (2026) Learning Human-Intention Priors from Large-Scale Human Demonstrations for Robotic Manipulation (2026) Being-H0.7: A Latent World-Action Model from Egocentric…

May 15, 2026

Paper page - AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation

…learning and backward simulation techniques. AI-generated summary Few-step video generation has been significantly advanced by consistency distillation . However, the performance of consistency-distilled models often degrades as more sampling steps…

May 14, 2026

Paper page - DiffusionOPD: A Unified Perspective of On-Policy Distillation in Diffusion Models

…Quanhao Li , , , , , , , , , Abstract DiffusionOPD enables efficient multi-task training for diffusion models through online policy distillation, outperforming existing reinforcement learning approaches in both training efficiency and final performance. AI-generated summary Reinforcement…

May 15, 2026

Paper page - EvoTrainer: Co-Evolving LLM Policies and Training Harnesses for Autonomous Agentic Reinforcement Learning

…This is more than conventional AI development, it is AI evolution in action. This is an automated message from the Librarian Bot . I found the following papers similar to this paper. The…

Jun 11, 2026

Paper page - F-GRPO: Factorized Group-Relative Policy Optimization for Unified Candidate Generation and Ranking

…and ranking in a single autoregressive model using factorized group-relative policy optimization to address credit assignment challenges in end-to-end retrieval optimization. AI-generated summary Traditional retrieval pipelines optimize utility…

May 14, 2026

Followed topics

Paper page - Beyond Reasoning: Reinforcement Learning Unlocks Parametric Knowledge in LLMs

Paper page - Learning while Deploying: Fleet-Scale Reinforcement Learning for Generalist Robot Policies

Paper page - CausalCine: Real-Time Autoregressive Generation for Multi-Shot Video Narratives

Paper page - Fast-dDrive: Efficient Block-Diffusion VLM for Autonomous Driving

Paper page - Learning Agentic Policy from Action Guidance

Paper page - IntentVLA: Short-Horizon Intent Modeling for Aliased Robot Manipulation

Paper page - AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation

Paper page - DiffusionOPD: A Unified Perspective of On-Policy Distillation in Diffusion Models

Paper page - EvoTrainer: Co-Evolving LLM Policies and Training Harnesses for Autonomous Agentic Reinforcement Learning

Paper page - F-GRPO: Factorized Group-Relative Policy Optimization for Unified Candidate Generation and Ranking