Search

Showing top 11 results for "AI performance claims"

huggingface.co › papers › 2602.00095

Paper page - EDU-CIRCUIT-HW: Evaluating Multimodal Large Language Models on Real-World University-Level STEM Student Handwritten Solutions

… We really appreciate your interest and feedback ^ ^. the four-way taxonomy of recognition errors and the diagnostic LLM detector that ties those errors to downstream grading is the standout part for me. it's refreshing to see upstream recognition and autograding evaluated together, and the claim th… …

May 8, 2026