Search: AI policy and controls

Paper page - KL for a KL: On-Policy Distillation with Control Variate Baseline

… The following papers were recommended by the Semantic Scholar API Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes 2026 Hybrid Policy Distillation for LLMs 2026 Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation 2026 A Su… …

May 15, 2026

Paper page - Operating-Layer Controls for Onchain Language-Model Agents Under Real Capital

Papers arxiv:2604.26091 Operating-Layer Controls for Onchain Language-Model Agents Under Real Capital Published on Apr 28 Submitted by Poof on Apr 30 DXRG AI Inc Authors: , , , , , , Abstract Autonomous language-model agents managing real cryptocurrency trades demonstrated high reliability through … …

Apr 30, 2026

Paper page - T^2PO: Uncertainty-Guided Exploration Control for Stable Multi-Turn Agentic Reinforcement Learning

… To address this issue, we propose Token- and Turn-level Policy Optimization T^2PO , an uncertainty-aware framework that explicitly controls exploration at fine-grained levels. …

May 5, 2026

Paper page - Adaptive Teacher Exposure for Self-Distillation in LLM Reasoning

… The following papers were recommended by the Semantic Scholar API PAINT: Partial-Solution Adaptive Interpolated Training for Self-Distilled Reasoners 2026 Rebellious Student: Reversing Teacher Signals for Reasoning Exploration with Self-Distilled RLVR 2026 On-Policy Distillation with Best-of-N Teac… …

May 14, 2026

Paper page - LiSA: Lifelong Safety Adaptation via Conservative Policy Induction

… Three modules in LiSA: ① Broad policy abstraction — turn sparse failures into reusable policies ② Conflict-aware local policies — preserve boundary cues in mixed-label regions where a single broad rule would overgeneralize ③ Evidence-aware confidence gating — Beta posterior lower bound, so "validat… …

May 15, 2026

Paper page - When to Think, When to Speak: Learning Disclosure Policies for LLM Reasoning

… The following papers were recommended by the Semantic Scholar API Think Through Uncertainty: Improving Long-Form Generation Factuality via Reasoning Calibration 2026 PAINT: Partial-Solution Adaptive Interpolated Training for Self-Distilled Reasoners 2026 GRPO-VPS: Enhancing Group Relative Policy Op… …

May 7, 2026

Paper page - Recovering Hidden Reward in Diffusion-Based Policies

… The following papers were recommended by the Semantic Scholar API Flow Matching Policy with Entropy Regularization 2026 ScoRe-Flow: Complete Distributional Control via Score-Based Reinforcement Learning for Flow Matching 2026 Truncated Rectified Flow Policy for Reinforcement Learning with One-Step … …

May 8, 2026

Paper page - Addressing Performance Saturation for LLM RL via Precise Entropy Curve Control

… AI-generated summary Reinforcement learning RL has enabled complex reasoning abilities in large language models LLMs . However, most RL algorithms suffer from performance saturation , preventing continued gains as RL training scales. …

May 12, 2026

Paper page - AgensFlow: A Coordination-Policy Substrate for Multi-Agent Systems

… The evaluation shows three main results: learned routing reaches a higher-quality operating point than a fixed pipeline baseline on coordination-heavy classes; skip:X isolates topology compression as a meaningful part of the substrate; and warm-started policy graphs can reduce exploration cost whil… …

May 28, 2026

Paper page - RLDX-1 Technical Report

… The following papers were recommended by the Semantic Scholar API Modular Sensory Stream for Integrating Physical Feedback in Vision-Language-Action Models 2026 ProgressVLA: Progress-Guided Diffusion Policy for Vision-Language Robotic Manipulation 2026 $M^2$-VLA: Boosting Vision-Language Models for… …