Paper page - A^2TGPO: Agentic Turn-Group Policy Optimization with Adaptive Turn-level Clipping
…Second, small groups degrade by design: (1) $|\mathcal{G}_{q,t}|=1$ is zeroed out, falling back to pure outcome reward; (2) small $G_t$ implies most rollouts terminated early, so outcome…