Search

Showing top 12 results for "Model performance claims"

huggingface.co › papers › 2605.04523

Paper page - RaguTeam at SemEval-2026 Task 8: Meno and Friends in a Judge-Orchestrated LLM Ensemble for Faithful Multi-Turn Response Generation

Papers arxiv:2605.04523 RaguTeam at SemEval-2026 Task 8: Meno and Friends in a Judge-Orchestrated LLM Ensemble for Faithful Multi-Turn Response Generation Published on May 6 Submitted by Ivan Bondarenko on May 8 Novosibirsk State University Authors: Ivan Bondarenko , Roman Derunets , , , Ivan Chern… …

May 8, 2026
huggingface.co › papers › 2602.00095

Paper page - EDU-CIRCUIT-HW: Evaluating Multimodal Large Language Models on Real-World University-Level STEM Student Handwritten Solutions

… We really appreciate your interest and feedback ^ ^. the four-way taxonomy of recognition errors and the diagnostic LLM detector that ties those errors to downstream grading is the standout part for me. it's refreshing to see upstream recognition and autograding evaluated together, and the claim th… …

May 8, 2026