Paper page - Nonsense Helps: Prompt Space Perturbation Broadens Reasoning Exploration
…learning with verifiable rewards by usingLorem Ipsum perturbations to enhance exploration in large language model training. AI-generated summary Reinforcement learning with verifiable rewards, particularly Group Relative Policy Optimization (GRPO), has significantly…