Paper page - Nonsense Helps: Prompt Space Perturbation Broadens Reasoning Exploration
…Steering Probability Squeezing for Better Exploration in Reinforcement Learning for Large Language Models (2026) MCPO: Mastery-Consolidated Policy Optimization for Large Reasoning Models (2026) Too Correct to Learn: Reinforcement Learning on Saturated…