Search: AI training practices

Paper page - Nonsense Helps: Prompt Space Perturbation Broadens Reasoning Exploration

…learning with verifiable rewards by usingLorem Ipsum perturbations to enhance exploration in large language model training. AI-generated summary Reinforcement learning with verifiable rewards, particularly Group Relative Policy Optimization (GRPO), has significantly…

May 8, 2026

Paper page - Large Language Models Explore by Latent Distilling

…Deferring the Distiller's training step (backward pass and weight update) to the CPU-bound post-processing intervals to hide the latency is a fantastic practical touch, keeping the throughput overhead under…

Apr 29, 2026

Paper page - Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training

…AI-generated summary In settings where labeled verifiable training data is the binding constraint, each checked example should be allocated carefully. The standard practice is to use this data directly on the…

Paper page - Model Merging Scaling Laws in Large Language Models

…fixed budget--turning merging from heuristic practice into a computationally efficient, planable alternative to multitask training . This suggests a scaling principle for distributed generative AI : predictable gains can be achieved by composing…

May 12, 2026

Paper page - Mela: Test-Time Memory Consolidation based on Transformation Hypothesis

…beyond their training length. Extensive ablation studies validate the contribution of each component and provide guidance for practical configuration. Our code is publicly available at https://github.com/Musubi-ai/Mela Get…

May 12, 2026

Paper page - Geometry Conflict: Explaining and Controlling Forgetting in LLM Continual Post-Training

…AI-generated summary Continual post-training aims to extend large language models (LLMs) with new knowledge, skills, and behaviors, yet it remains unclear when sequential updates enable capability transfer and when they…

May 12, 2026

Paper page - AEM: Adaptive Entropy Modulation for Multi-Turn Agentic Reinforcement Learning

…We present AEM, a supervision-free credit assignment method that adaptively modulates entropy dynamics during RL training to improve the exploration-exploitation trade-off . Since in agentic RL the environment is typically…

May 11, 2026

Paper page - Fast Byte Latent Transformer

…We address this bottleneck in the Byte Latent Transformer (BLT) through new training and generation techniques. First, we introduce BLT Diffusion (BLT-D), a new model and our fastest BLT variant, trained…

May 12, 2026

Paper page - SkillOS: Learning Skill Curation for Self-Evolving Agents

…Existing approaches either rely on manual skill curation , prescribe heuristic skill operations, or train for short-horizon skill operations. However, they still struggle to learn complex long-term curation policies from indirect…

May 8, 2026

Paper page - Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning

…that trains a single policy to simultaneously evolve skill selection, utilization, and distillation capabilities using a shared task-outcome objective, demonstrating superior performance over existing baselines in complex task environments. AI-generated…

May 8, 2026

Followed topics

Paper page - Nonsense Helps: Prompt Space Perturbation Broadens Reasoning Exploration

Paper page - Large Language Models Explore by Latent Distilling

Paper page - Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training

Paper page - Model Merging Scaling Laws in Large Language Models

Paper page - Mela: Test-Time Memory Consolidation based on Transformation Hypothesis

Paper page - Geometry Conflict: Explaining and Controlling Forgetting in LLM Continual Post-Training

Paper page - AEM: Adaptive Entropy Modulation for Multi-Turn Agentic Reinforcement Learning

Paper page - Fast Byte Latent Transformer

Paper page - SkillOS: Learning Skill Curation for Self-Evolving Agents

Paper page - Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning