Search: Performance targets

Paper page - Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex

… On diverse reasoning tasks and LLM backbones, LPO consistently improves training performance over typical policy gradient baselines under matched targets, while intrinsically preserving optimization stability and response diversity. …

May 11, 2026

Paper page - MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image

… On established Multimodal Tabular Learning benchmarks, we show that tuning the embeddings to the task improves performance. …

May 14, 2026

Paper page - FAMA: Failure-Aware Meta-Agentic Framework for Open-Source LLMs in Interactive Tool Use Environments

… Experiments across open-source LLMs demonstrate performance gains up to 27% across evaluation modes over standard baselines. …

Apr 30, 2026

Paper page - Addressing Performance Saturation for LLM RL via Precise Entropy Curve Control

Papers arxiv:2604.26326 Addressing Performance Saturation for LLM RL via Precise Entropy Curve Control Published on May 10 Submitted by Bolian Li on May 12 Purdue University Authors: Bolian Li , Yifan Wang , , , , Abstract Entrocraft, a rejection-sampling approach for reinforcement learning, addres… …

May 12, 2026

Paper page - From Pixels to Concepts: Do Segmentation Models Understand What They Segment?

…the target region and ground-truth mask are preserved, while attributes such as surface appearance, context, or material composition are modified to introduce misleading semantic cues. The benchmark contains 2,146 paired…

May 14, 2026

Paper page - From Web to Pixels: Bringing Agentic Search into Visual Perception

… Experiments show that Pixel-Searcher achieves the strongest open-source performance across all three task views, while failures mainly arise from evidence acquisition, identity resolution, and visual instance binding . …

May 13, 2026

Paper page - Anisotropic Modality Align

…This framework leverages the internal geometric prior of the target modality and performs bounded correction on source-modality representations, thereby constructing substitute representations in the target modality. Experiments confirm its benefits in…

May 11, 2026

Paper page - Unmasking On-Policy Distillation: Where It Helps, Where It Hurts, and Why

… At present, addressing these questions typically requires costly training runs whose aggregate performance metrics obscure the dynamics at the level of individual tokens. …

May 12, 2026

Paper page - PlantMarkerBench: A Multi-Species Benchmark for Evidence-Grounded Plant Marker Reasoning

… Although frontier models achieve relatively strong performance on direct expression evidence, performance drops substantially on functional, indirect, and weak-support evidence, with evidence-type confusion emerging as a dominant failure mode. …

May 12, 2026

Paper page - PREPING: Building Agent Memory without Tasks

… Experiments on AppWorld, BFCL v3, and MCP-Universe show that Preping substantially improves over a no-memory baseline and achieves performance competitive with strong playbook-based methods built from offline or online experience, with deployment cost 2.99times lower on AppWorld and 2.23times lower… …

May 15, 2026

Followed topics

Paper page - Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex

Paper page - MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image

Paper page - FAMA: Failure-Aware Meta-Agentic Framework for Open-Source LLMs in Interactive Tool Use Environments

Paper page - Addressing Performance Saturation for LLM RL via Precise Entropy Curve Control

Paper page - From Pixels to Concepts: Do Segmentation Models Understand What They Segment?

Paper page - From Web to Pixels: Bringing Agentic Search into Visual Perception

Paper page - Anisotropic Modality Align

Paper page - Unmasking On-Policy Distillation: Where It Helps, Where It Hurts, and Why

Paper page - PlantMarkerBench: A Multi-Species Benchmark for Evidence-Grounded Plant Marker Reasoning

Paper page - PREPING: Building Agent Memory without Tasks