Search: performance benchmarking

Paper page - LLaVA-UHD v4: What Makes Efficient Visual Encoding in MLLMs?

…Across a diverse set of benchmarks covering document understanding, OCR, and general VQA, LLaVA-UHD v4 reduces visual-encoding FLOPs by 55.8% while matching or even surpassing baseline performance. These results…

May 12, 2026

Paper page - MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image

…On established Multimodal Tabular Learning benchmarks, we show that tuning the embeddings to the task improves performance. Existing benchmarks, however, often focus on the mere co-occurrence of modalities; this leads to…

May 14, 2026

Paper page - FastKernels: Benchmarking GPU Kernel Generation in Production

…Benchmarking GPU Kernel Generation in Production Published on May 22 Submitted by Gabriele Oliaro on May 27 Snowflake Authors: , , , , , , , Abstract FastKernels addresses the gap between benchmark evaluation and production performance for LLM…

May 27, 2026

Paper page - PatRe: A Full-Stage Office Action and Rebuttal Generation Benchmark for Patent Examination

…Qiyao Wang , , , , , , Abstract PatRe benchmark models the complete patent examination process as a dynamic, multi-turn interaction between examiners and applicants, revealing key performance differences among LLMs in legal reasoning and technical…

May 6, 2026

Paper page - SeePhys Pro: Diagnosing Modality Transfer and Blind-Training Effects in Multimodal RLVR for Physics Reasoning

…SeePhys Pro benchmark reveals that current multimodal models struggle with representation-invariant reasoning when information shifts from text to visual formats, and demonstrates that blind training can improve performance through residual textual…

May 13, 2026

Supercharge your OCR Pipelines with Open Models

…LightOnOCR-1B would fit nicely in this comparison as a strong performer that punches above its weight: 🎯 Performance : Achieves state-of-the-art results on OlmOCR Benchmark for its size—beats DeepSeek…

May 28, 2026 · merve

Paper page - Urban-ImageNet: A Large-Scale Multi-Modal Dataset and Evaluation Framework for Urban Space Perception

…A multi-scale study further examines how model performance changes as balanced training data increases from 1K, 10K to 100K images. Urban-ImageNet provides a unified, theory-grounded, multi-city benchmark for…

May 13, 2026

Paper page - From Pixels to Concepts: Do Segmentation Models Understand What They Segment?

…Our CAFE provides a controlled benchmark for diagnosing whether promptable segmentation models perform concept-faithful grounding rather than shortcut-driven mask retrieval. View arXiv page View PDF Project page GitHub 4 Add…

May 14, 2026

Paper page - Sparkle: Realizing Lively Instruction-Guided Video Background Replacement via Decoupled Guidance

…Experiments demonstrate that our dataset and the model trained on it achieve substantially better performance than all existing baselines on both OpenVE-Bench and Sparkle-Bench. Our proposed dataset, benchmark, and model…

May 8, 2026

Paper page - A Benchmark for Interactive World Models with a Unified Action Generation Framework

Papers arxiv:2605.03941 A Benchmark for Interactive World Models with a Unified Action Generation Framework Published on May 5 Submitted by taesiri on May 6 Authors: , , , , , , , , , , Abstract A comprehensive benchmark named…