Search: AI rollouts to apps

Paper page - LEAD: Length-Efficient Adaptive and Dynamic Reasoning for Large Language Models

…directing optimization capacity to the most informative learning signal. Furthermore, it estimates an adaptive per-problem target length online based on the model's own correct rollouts, applying a symmetric efficiency reward…

May 14, 2026

Paper page - RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO

…Yanzuo Lu , , Abstract RAVEN enables real-time video generation through causal autoregressive extrapolation with improved training alignment, while CM-GRPO enhances performance via reinforcement learning applied to consistency model sampling. AI-generated…

May 15, 2026

Paper page - Beyond Reasoning: Reinforcement Learning Unlocks Parametric Knowledge in LLMs

…those whose answers never appear in 128 pre-RL samples (only ~18% of training data) drive ~83% of the gain, since rare correct rollouts still emerge during training and get reinforced. Together…

May 13, 2026

Paper page - Learning while Deploying: Fleet-Scale Reinforcement Learning for Generalist Robot Policies

…Starting from a pretrained VLA policy, LWD closes the loop between deployment, shared physical experience, policy improvement , and redeployment by using autonomous rollouts and human interventions collected across a robot fleet. To…

May 4, 2026

Paper page - CausalCine: Real-Time Autoregressive Generation for Multi-Shot Video Narratives

…Trained primarily for short-horizon continuation, they treat long sequences as extended single shots, inevitably suffering from motion stagnation and semantic drift during long rollouts. To bridge this gap, we introduce CausalCine…

May 13, 2026

Paper page - EvoTrainer: Co-Evolving LLM Policies and Training Harnesses for Autonomous Agentic Reinforcement Learning

…Rather than relying on a static recipe, EvoTrainer enables LLM policies and their training harnesses to evolve jointly over time. This is more than conventional AI development, it is AI evolution in…

Jun 11, 2026

Paper page - Rubric-based On-policy Distillation

…AI-generated summary On-policy distillation (OPD) is a powerful paradigm for model alignment , yet its reliance on teacher logits restricts its application to white-box scenarios. We contend that structured semantic…

May 11, 2026

Paper page - DiffusionOPD: A Unified Perspective of On-Policy Distillation in Diffusion Models

…existing reinforcement learning approaches in both training efficiency and final performance. AI-generated summary Reinforcement learning has emerged as a powerful tool for improving diffusion-based text-to-image models, but existing…

May 15, 2026

Paper page - Causal Forcing++: Scalable Few-Step Autoregressive Diffusion Distillation for Real-Time Interactive Video Generation

…reduced latency and improved quality compared to existing chunk-wise approaches. AI-generated summary Real-time interactive video generation requires low-latency, streaming, and controllable rollout. Existing autoregressive (AR) diffusion distillation methods…

May 15, 2026

Paper page - Nonsense Helps: Prompt Space Perturbation Broadens Reasoning Exploration

…when all sampled rollouts for a query fail, the relative advantage collapses to zero. Consequently, the model loses effective training signals for these questions, wasting the training data and computational budget. While…

May 8, 2026

Followed topics

Paper page - LEAD: Length-Efficient Adaptive and Dynamic Reasoning for Large Language Models

Paper page - RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO

Paper page - Beyond Reasoning: Reinforcement Learning Unlocks Parametric Knowledge in LLMs

Paper page - Learning while Deploying: Fleet-Scale Reinforcement Learning for Generalist Robot Policies

Paper page - CausalCine: Real-Time Autoregressive Generation for Multi-Shot Video Narratives

Paper page - EvoTrainer: Co-Evolving LLM Policies and Training Harnesses for Autonomous Agentic Reinforcement Learning

Paper page - Rubric-based On-policy Distillation

Paper page - DiffusionOPD: A Unified Perspective of On-Policy Distillation in Diffusion Models

Paper page - Causal Forcing++: Scalable Few-Step Autoregressive Diffusion Distillation for Real-Time Interactive Video Generation

Paper page - Nonsense Helps: Prompt Space Perturbation Broadens Reasoning Exploration