Search: performance benchmarking

Paper page - Counting as a minimal probe of language model reliability

…AI-generated summary Large language models perform strongly on benchmarks in mathematical reasoning , coding and document analysis , suggesting a broad ability to follow instructions. However, it remains unclear whether such success reflects…

May 5, 2026

Paper page - AutoMedBench: Towards Medical AutoResearch with Agentic AI Models

…However, existing medical agent benchmarks primarily evaluate final outputs, providing limited visibility into agent behavior within the research process. To address this gap, we present AutoMedBench, a workflow-aware benchmark for autonomous…

Jun 3, 2026

Paper page - Task-Focused Memorization for Multimodal Agents

…determine what information to store in long-term memory for multimodal agents, improving performance on streaming video benchmarks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Long-term memory is essential for…

Jun 1, 2026

Paper page - FeatCal: Feature Calibration for Post-Merging Models

…merging leads to FeatCal, a calibration method that reduces performance gaps through layer-wise weight updates without gradient descent, achieving superior benchmark results and efficiency. AI-generated summary Model merging combines task…

May 14, 2026

Paper page - Step-level Optimization for Efficient Computer-use Agents

…Despite recent advances in benchmark performance, strong computer-use agents remain expensive and slow in practice, since most systems invoke large multimodal models at nearly every interaction step. We argue that this…

May 1, 2026

Paper page - PersonalAI 2.0: Enhancing knowledge graph traversal/retrieval with planning mechanism for Personalized LLM Agents

…The central point of PAI-2 design is its ability to perform adaptive, iterative information search, guided by extracted entities, matched graph vertices and generated clue-queries. Conducted evaluation over six benchmarks…

May 14, 2026

Paper page - Seeing Isn't Knowing: Do VLMs Know When Not to Answer Spatial Questions (and Why)?

…Probing Spatial Representation in Vision-Language Models (2026) Do MLLMs Understand Pointing? Benchmarking and Enhancing Referential Reasoning in Egocentric Vision (2026) SpaceDG: Benchmarking Spatial Intelligence under Visual Degradation (2026) CaMo: Camera Motion…

Jun 1, 2026

Followed topics

Search

Paper page - Counting as a minimal probe of language model reliability

Top stories

Paper page - Benchmarking Visual State Tracking in Multimodal Video Understanding

Paper page - SOCO: Benchmarking Semantic Object Correspondence in Vision Foundation Models

Paper page - AutoMedBench: Towards Medical AutoResearch with Agentic AI Models

Paper page - Task-Focused Memorization for Multimodal Agents

Paper page - FeatCal: Feature Calibration for Post-Merging Models

Paper page - Step-level Optimization for Efficient Computer-use Agents

Paper page - PersonalAI 2.0: Enhancing knowledge graph traversal/retrieval with planning mechanism for Personalized LLM Agents

Paper page - Seeing Isn't Knowing: Do VLMs Know When Not to Answer Spatial Questions (and Why)?

Paper page - Advancing Creative Physical Intelligence in Large Multimodal Models

Paper page - Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses

Paper page - Rethinking Memory as Continuously Evolving Connectivity