Paper page - Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning
…training methods for large language models are analyzed through a unified framework that decomposes rollout processes into generation, filtering, control, and replay stages, enabling systematic evaluation and improvement across reasoning tasks. AI…