Search: Quest community

Paper page - Benchmarking Visual State Tracking in Multimodal Video Understanding

…VSTAT consists of 834 clips drawn from both synthetic and real-world videos, paired with 1,500 questions that cannot be answered from any single frame or short segment, requiring continuous perception…

Jun 3, 2026

Paper page - Beyond Recall: Behavioral Specification as an Interpretive Layer for AI Personalization

…Lift is greatest on interpretation-required questions , where providing an interpretive layer enables model behavior that extracted facts or raw corpus do not. Conversely, on recall-required questions , this layer can interfere…

Jun 1, 2026

Paper page - Dream.exe: Can Video Generation Models Dream Executable Robot Manipulation?

…A natural question follows: how well do these models reflect the physical world when their generated videos leave the screen and enter reality? We propose robotic manipulation as a concrete, measurable window…

Jun 5, 2026

Paper page - The First Token Knows: Single-Decode Confidence for Hallucination Detection

…AI-generated summary Self-consistency detects hallucinations by generating multiple sampled answers to a question and measuring agreement, but this requires repeated decoding and can be sensitive to lexical variation. Semantic self…

May 7, 2026

Paper page - Can LLMs Introspect? A Reality Check

…View arXiv page View PDF Add to collection Community Can large language models detect and report their own internal states? A number of studies have argued that the answer to this question…

May 27, 2026

Paper page - AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward

…Unlike holistic scalar rewards, DVReward utilizes an LLM to decompose complex user requests into atomic, verifiable semantic and quality questions, which are then evaluated by a general MLLM to provide reliable and…

May 13, 2026

Paper page - IndustryBench: Probing the Industrial Knowledge Boundaries of LLMs

…Songlin Bai , Xintong Wang , , , , , , , , , , , , , Liang Ding Abstract IndustryBench evaluates industrial procurement question answering systems in Chinese against national standards, revealing significant gaps in safety compliance and highlighting the need for safety-aware…

May 13, 2026

Followed topics

Search