Search: Performance discussion

Paper page - IntentGrasp: A Comprehensive Benchmark for Intent Understanding

…Notably, 17 out of 20 tested models perform worse than a random-guess baseline (15.2%) on Gem Set, while the estimated human performance is ~81.1%, showing substantial room for improvement…

May 11, 2026

Paper page - Boosting Reinforcement Learning with Verifiable Rewards via Randomly Selected Few-Shot Guidance

…On several benchmarks, FEST outperforms baselines with magnitudes less SFT data, even matching their performance with full dataset. View arXiv page View PDF GitHub 0 Add to collection Community This is a…

May 15, 2026

Paper page - SOCO: Benchmarking Semantic Object Correspondence in Vision Foundation Models

…Experiments show that current vision backbones encode semantic structure but struggle with cross-category correspondence and object-part position, while LVLMs perform better at text-prompted localization than visual-reference matching. SOCO…

Jun 2, 2026

Paper page - The DAWN of World-Action Interactive Models

…Rather than eliminating test-time world evolution altogether or rolling out the full future in pixel space, DAWN performs a short explicit latent rollout that is sufficient to support long-horizon trajectory…

May 14, 2026

Paper page - Pion: A Spectrum-Preserving Optimizer via Orthogonal Equivalence Transformation

…training that uses orthogonal equivalence transformations to maintain singular values during weight updates, offering stable performance comparable to standard optimizers. AI-generated summary We introduce Pion, a spectrum-preserving optimizer for large…

May 13, 2026

Paper page - Robust-U1: Can MLLMs Self-Recover Corrupted Visual Content for Robust Understanding?

…Jiaqi Tang , , , , , , , , Abstract Robust-U1 enhances multimodal large language models' robustness against visual corruptions through self-recovery capabilities that improve both visual quality and reasoning performance. Generated by Qwen/Qwen2.5-Coder…

Jun 12, 2026

Followed topics

Search

Paper page - IntentGrasp: A Comprehensive Benchmark for Intent Understanding

Paper page - Boosting Reinforcement Learning with Verifiable Rewards via Randomly Selected Few-Shot Guidance

Paper page - SOCO: Benchmarking Semantic Object Correspondence in Vision Foundation Models

Paper page - The DAWN of World-Action Interactive Models

Paper page - Pion: A Spectrum-Preserving Optimizer via Orthogonal Equivalence Transformation

Paper page - Robust-U1: Can MLLMs Self-Recover Corrupted Visual Content for Robust Understanding?

Paper page - FeatCal: Feature Calibration for Post-Merging Models

Paper page - StableI2I: Spotting Unintended Changes in Image-to-Image Transition

Paper page - Light-WAM: Efficient World Action Models with State-Fusion Action Decoding

Paper page - Many-Shot CoT-ICL: Making In-Context Learning Truly Learn