Paper page - Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning
…High Accuracy Agentic Post-Training at Low Compute Cost (2026) Prune as You Generate: Online Rollout Pruning for Faster and Better RLVR (2026) Learning Adaptive LLM Decoding (2026) Prompt replay: speeding up…