Paper page - A^2TGPO: Agentic Turn-Group Policy Optimization with Adaptive Turn-level Clipping
…AI-generated summary Reinforcement learning for agentic large language models (LLMs) typically relies on a sparse, trajectory-level outcome reward , making it difficult to evaluate the contribution of individual tool-calls within…