Search: Performance discussion

Paper page - Evaluating Large Language Models in Dynamic Clinical Decision-Making with Standardized Patient Cases

…Applying MedSP1000 to a range of general-purpose and medically specialized LLMs, we find that performance on static benchmarks does not reliably translate to such educational scenarios. The best-performing model, GPT…

Jun 4, 2026

Paper page - Do Coding Agents Deceive Us? Detecting and Preventing Cheating via Capped Evaluation with Randomized Tests

…We propose CapCode, a framework for constructing coding datasets with randomized tests whose best achievable non-cheating performance is deliberately capped below one. This capped-performance design gives evaluation scores a clearer…

Jun 10, 2026

Paper page - GLiNER-Relex: A Unified Framework for Joint Named Entity Recognition and Relation Extraction

…While recent approaches treat NER and RE as separate tasks requiring distinct models, we introduce GLiNER -Relex, a unified architecture that extends the GLiNER framework to perform both entity recognition and relation…

May 13, 2026

Paper page - Domain-Specific Data Synthesis for LLMs via Minimal Sufficient Representation Learning

…Tong Ye , , , , , , , , Abstract DOMINO enables domain-specific data synthesis through an inductive approach that learns domain representations from reference examples, improving code benchmark performance without requiring explicit domain descriptions. Generated by Qwen…

Jun 3, 2026

Paper page - A Causal Language Modeling Detour Improves Encoder Continued Pretraining

…Rian Touchent , Abstract Switching from Masked Language Modeling to Causal Language Modeling during encoder adaptation improves downstream performance on biomedical texts through dense supervision effects in lower transformer layers. AI-generated summary…

May 13, 2026

Paper page - Can Muon Fine-tune Adam-Pretrained Models?

…Xingyu Qu , , Abstract Optimizer mismatch between Adam and Muon during fine-tuning degrades performance due to differing implicit biases, but this can be mitigated using parameter-efficient fine-tuning methods like LoRA…

Paper page - Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling

…AI-generated summary Recent progress in reasoning models has substantially advanced long-horizon mathematical and scientific problem solving , with several systems now reaching gold-medal-level performance on International Mathematical Olympiad (IMO…

May 15, 2026

Followed topics

Search