Paper page - PREPING: Building Agent Memory without Tasks
…View arXiv page View PDF Project page GitHub 2 Add to collection Community LLM agents often need memory to solve tasks in new tool environments, but memory is usually built only after…
…View arXiv page View PDF Project page GitHub 2 Add to collection Community LLM agents often need memory to solve tasks in new tool environments, but memory is usually built only after…
…Process-Reward Optimization for Computer Use Agents (2026) UI-Copilot: Advancing Long-Horizon GUI Automation via Tool-Integrated Policy Optimization (2026) OpenMobile: Building Open Mobile Agents with Task and Trajectory Synthesis (2026…
…A Hierarchical Benchmark for Visual Website Development with Agent Verification (2026) WebTestBench: Evaluating Computer-Use Agents towards End-to-End Automated Web Testing (2026) Test-Driven AI Agent Definition (TDAD): Compiling Tool…
…Model-Agnostic Experience Learning with Graph-Structured Memory for LLM Agents (2026) ARIADNE: Agentic Reward-Informed Adaptive Decision Exploration via Blackboard-Driven MCTS for Competitive Program Generation (2026) LLM as a Tool…
…Recent alternatives include agentic reasoning through code or tool calls, and latent reasoning with learnable hidden embeddings. However, agentic methods incur context-switching latency from external execution, while latent methods lack task…
…Measuring Reward Hacking in Long-Horizon Coding Agents (2026) Reward Hacking Benchmark: Measuring Exploits in LLM Agents with Tool Use (2026) Do Synthetic Trajectories Reflect Real Reward Hacking? A Systematic Study on…
…developing Claw-style personal agents with synthetic training data, verified workspaces, and benchmark evaluation. AI-generated summary Claw-style environments support multi-step workflows over local files, tools, and persistent workspace states…
…Which is, I think, why the interpretable traces are the most durable contribution here — not as the agent's own verdict, but as the surface an external check (a human, a tool…
…trajectory-level rewards verify final correctness but provide limited guidance on which intermediate reasoning steps or tool interactions contribute to the outcome. The difficulty is especially pronounced in multi-turn search agents…
…Generated by Qwen/Qwen2.5-Coder-32B-Instruct LLM agents are increasingly deployed as systems built around editable external harnesses, including prompts, skills, memories and tools, that shape task execution without changing…