Paper page - Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning
…Rohan Surana , , , , Zhenwei Tang , , , , , , , , , , , , , , Kuan-Hao Huang , , , Abstract Reinforcement learning post-training methods for large language models are analyzed through a unified framework that decomposes rollout processes into generation, filtering, control, and…