Paper page - Learning to Explore: Scaling Agentic Reasoning via Exploration-Aware Policy Optimization
…Xingyuan Hua , , Abstract Agents use variational inference to evaluate exploratory actions and selectively explore only when uncertainty is high, improving performance on text-based and GUI-based benchmarks. AI-generated summary Recent…