Search: capability limitations

Paper page - Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling

… These limitations hinder reliable assessment of both image editing models and reward models. …

May 14, 2026

Paper page - Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems

… This survey provides a unified review organized around four causally linked stages, which we term the LIFE progression: Lay the capability foundation, Integrate agents through collaboration, Find faults through attribution, and Evolve through autonomous self-improvement. …

May 15, 2026

Paper page - FocuSFT: Bilevel Optimization for Dilution-Aware Long-Context Fine-Tuning

…This training-time attention dilution (the starvation of content tokens in the attention distribution) weakens the gradient signal, limiting the model's ability to learn robust long-context capabilities. We introduce FocuSFT…

May 13, 2026

Paper page - BioTool: A Comprehensive Tool-Calling Dataset for Enhancing Biomedical Capabilities of Large Language Models

…A Comprehensive Tool-Calling Dataset for Enhancing Biomedical Capabilities of Large Language Models Published on May 7 Submitted by Xin Gao on May 8 University of California San Diego Authors: , , , , Abstract A…

May 8, 2026

Followed topics

Search

Paper page - Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling

Paper page - Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems

Paper page - FocuSFT: Bilevel Optimization for Dilution-Aware Long-Context Fine-Tuning

Paper page - BioTool: A Comprehensive Tool-Calling Dataset for Enhancing Biomedical Capabilities of Large Language Models

Paper page - OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories

Paper page - MedSkillAudit: A Domain-Specific Audit Framework for Medical Research Agent Skills

Paper page - SEIF: Self-Evolving Reinforcement Learning for Instruction Following

Paper page - CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing

Paper page - Assessing Pancreatic Ductal Adenocarcinoma Vascular Invasion: the PDACVI Benchmark

Paper page - A Benchmark for Interactive World Models with a Unified Action Generation Framework