Search: agentic tooling

Paper page - AgentLens: Revealing The Lucky Pass Problem in SWE-Agent Evaluation

…actions to Exploration , Implementation , Verification , or Orchestration based on trajectory history rather than tool identity alone. On AgentLens-Bench , the quality score separates passing trajectories into Lucky, Solid, and Ideal tiers and…

May 14, 2026

Paper page - CoHyDE: Iterative Co-Training of LLM Rewriter & Dense Encoder for Tool Retrieval

…Generated by Qwen/Qwen2.5-Coder-32B-Instruct Tool retrieval over large API catalog s is a core bottleneck for LLM agents: user queries arrive in colloquial, often underspecified language, while the…

May 29, 2026

Paper page - Advancing Creative Physical Intelligence in Large Multimodal Models

…Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing (2026) From Where Things Are to What They Are For: Benchmarking Spatial-Functional Intelligence in Multimodal LLMs (2026) MemEye: A Visual-Centric Evaluation…

May 28, 2026

Paper page - DAR: Deontic Reasoning with Agentic Harnesses

…View arXiv page View PDF Project page GitHub 2 Add to collection Community DAR introduces an agentic setup where LLMs query statutes on demand through tools rather than receiving all rules in…

Jun 4, 2026

Paper page - MapAgent: An Industrial-Grade Agentic Framework for City-scale Lane-level Map Generation

…We propose MapAgent, an industrial-grade agentic architecture that augments a vectorization backbone for specification-compliant lane-map production. Rather than merely adding an agent loop to map prediction, MapAgent couples backbone…

Jun 4, 2026

Paper page - CASCADE: Case-Based Continual Adaptation for Large Language Models During Deployment

…This design allows agents to accumulate, select, and refine task-relevant cases , transforming past experience into actionable knowledge. Across 16 diverse tasks spanning medical diagnosis, legal analysis, code generation, web search, tool…

May 11, 2026

Paper page - AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security

…An Open-Source Agentic Modeling Framework (2026) Auditing Agent Harness Safety (2026) SafeHarbor: Hierarchical Memory-Augmented Guardrail for LLM Agent Safety (2026) Security Risks in Tool-Enabled AI Agents: A Systematic Analysis…

May 29, 2026

Paper page - Agentic Chain-of-Thought Steering for Efficient and Controllable LLM Reasoning

…Scaling Interleaved Deliberation in Tool-Integrated Reasoning via Process-Supervised Reinforcement Learning (2026) Process Reward Agents for Steering Knowledge-Intensive Reasoning (2026) Learning Agent-Compatible Context Management for Long-Horizon Tasks (2026…

Jun 6, 2026

Paper page - SePO: Self-Evolving Prompt Agent for System Prompt Optimization

…Collaborative Agents for Dynamic Prompt Optimization in Large Language Models (2026) SPEAR: Code-Augmented Agentic Prompt Optimization (2026) Prompting Policies for Multi-step Reasoning and Tool-Use in Black-box LLMs with…

Jun 5, 2026

Paper page - MineExplorer: Evaluating Open-World Exploration of MLLM Agents in Minecraft

…Benchmarking Agent-as-a-Judge for Environment-Aware Evaluation (2026) AgentEscapeBench: Evaluating Out-of-Domain Tool-Grounded Reasoning in LLM Agents (2026) Terminal-World: Scaling Terminal-Agent Environments via Agent Skills (2026…

Jun 2, 2026

Followed topics