Search: performance benchmarking

Paper page - Reinforcing Multimodal Reasoning Against Visual Degradation

…We propose ROMA, an RL fine-tuning framework that modifies the optimization dynamics to reinforce reasoning against visual degradation while preserving clean-input performance. A dual-forward-pass strategy uses teacher forcing…

May 12, 2026

Paper page - From Web to Pixels: Bringing Agentic Search into Visual Perception

…Bringing Agentic Search into Visual Perception Published on May 12 Submitted by taesiri on May 13 Authors: , , , , , Abstract Researchers introduce WebEye, a benchmark for object localization requiring external knowledge resolution, and Pixel…

May 13, 2026

Paper page - Count Anything

…To support this setting, we construct CLOC, a Cross-domain Large-scale Object Counting dataset that reorganizes diverse public data sources into a unified benchmark. CLOC covers six visual domains: General Scene…

Jun 1, 2026

Paper page - MLS-Bench: A Holistic and Rigorous Assessment of AI Systems on Building Better AI

…Bohan Lyu , , , Jiaru Zhang , Qixin Xu , , , , , , , , , Junlin Yang , , , , , , , , Abstract Current AI agents struggle to invent generalizable and scalable machine learning methods, relying more on engineering tuning than true method discovery, with performance…

May 12, 2026

Paper page - TacoMAS: Test-Time Co-Evolution of Topology and Capability in LLM-based Multi-Agent Systems

…During inference, a fast capability loop updates agent expertise using trajectory-level feedback , while a slow meta-LLM-driven topology loop performs agents' birth-death operations on MAS, including edge edit , agent…

May 12, 2026

Paper page - Injecting Distributional Awareness into MLLMs via Reinforcement Learning for Deep Imbalanced Regression

…Yao Du , , Abstract A distribution-aware reinforcement learning framework improves multimodal large language models' numerical regression performance on long-tailed distributions through batch-level comparison-based supervision. AI-generated summary Multimodal large…

May 12, 2026

Paper page - Teaching Language Models to Think in Code

…Hyeon Hwang , , Abstract ThinC framework enables mathematical problem solving where code serves as the primary reasoning mechanism instead of a verification tool, demonstrating superior performance on math benchmarks. AI-generated summary Tool…

May 13, 2026

Paper page - Towards On-Policy Data Evolution for Visual-Native Multimodal Deep Search Agents

…Shijue Huang , , , , , , , , , Abstract A visual-native agent harness with image bank reference protocol enables reusable intermediate visual evidence and closed-loop data generation that improves multimodal deep search performance across multiple benchmarks…

May 13, 2026

Paper page - Learning to Explore: Scaling Agentic Reasoning via Exploration-Aware Policy Optimization

…Xingyuan Hua , , Abstract Agents use variational inference to evaluate exploratory actions and selectively explore only when uncertainty is high, improving performance on text-based and GUI-based benchmarks. AI-generated summary Recent…

May 14, 2026

Paper page - ATLAS: Agentic or Latent Visual Reasoning? One Word is Enough for Both

…Ziyu Guo , , , Abstract ATLAS presents a visual reasoning framework that combines agentic operations and latent representations using functional tokens, enabling efficient training and improved performance on complex benchmarks. AI-generated summary Visual…

May 15, 2026

Followed topics

Paper page - Reinforcing Multimodal Reasoning Against Visual Degradation

Paper page - From Web to Pixels: Bringing Agentic Search into Visual Perception

Paper page - Count Anything

Paper page - MLS-Bench: A Holistic and Rigorous Assessment of AI Systems on Building Better AI

Paper page - TacoMAS: Test-Time Co-Evolution of Topology and Capability in LLM-based Multi-Agent Systems

Paper page - Injecting Distributional Awareness into MLLMs via Reinforcement Learning for Deep Imbalanced Regression

Paper page - Teaching Language Models to Think in Code

Paper page - Towards On-Policy Data Evolution for Visual-Native Multimodal Deep Search Agents

Paper page - Learning to Explore: Scaling Agentic Reasoning via Exploration-Aware Policy Optimization

Paper page - ATLAS: Agentic or Latent Visual Reasoning? One Word is Enough for Both