Search: AI reasoning math

Paper page - Learning to Build the Environment: Self-Evolving Reasoning RL via Verifiable Environment Synthesis

…The following papers were recommended by the Semantic Scholar API Verifier-Backed Hard Problem Generation for Mathematical Reasoning (2026) ANCORA: Learning to Question via Manifold-Anchored Self-Play for Verifiable Reasoning (2026…

May 18, 2026

Paper page - Balanced Aggregation: Understanding and Fixing Aggregation Bias in GRPO

…AI-generated summary Reinforcement learning with verifiable rewards (RLVR) has become a central paradigm for improving reasoning and code generation in large language models, and GRPO-style training is widely adopted for…

May 8, 2026

Paper page - FrontierSmith: Synthesizing Open-Ended Coding Problems at Scale

…Compiling LLM Reasoning into Symbolic Solvers for Efficient Program Synthesis (2026) Marco DeepResearch: Unlocking Efficient Deep Research Agents via Verification-Centric Design (2026) Verifier-Backed Hard Problem Generation for Mathematical Reasoning (2026…

May 15, 2026

Paper page - Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning

…AI-generated summary Reinforcement learning (RL) has become a central post-training tool for improving the reasoning abilities of large language models (LLMs). In these systems, the rollout, the trajectory sampled from…

May 6, 2026

Paper page - Economy of Minds: Emerging Multi-Agent Intelligence with Economic Interactions

…We show that, initialized with weak agents, the economy produces emergent multi-step reasoning strategies and outperforms stronger monolithic baselines across five agentic tasks, including mathematical reasoning, financial research, scientific research, accelerator…

Jun 4, 2026

Paper page - Large Language Models Explore by Latent Distilling

…Empirical results show that ESamp significantly boosts the Pass@k efficiency of reasoning models, showing superior or comparable performance to strong stochastic and heuristic baselines. Notably, ESamp achieves robust generalization across mathematics…

Apr 29, 2026

Followed topics

Search

Paper page - Learning to Build the Environment: Self-Evolving Reasoning RL via Verifiable Environment Synthesis

Paper page - Balanced Aggregation: Understanding and Fixing Aggregation Bias in GRPO

Paper page - FrontierSmith: Synthesizing Open-Ended Coding Problems at Scale

Paper page - Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning

Paper page - Economy of Minds: Emerging Multi-Agent Intelligence with Economic Interactions

Paper page - Large Language Models Explore by Latent Distilling

Paper page - PAAC: Privacy-Aware Agentic Device-Cloud Collaboration

Paper page - Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key

Paper page - Exploring Autonomous Agentic Data Engineering for Model Specialization

Paper page - Guiding LLM Post-training Data Engineering with Model Internals from Sparse Autoencoders