Paper page - BenchEvolver: Frontier Task Synthesis via Solution-Centric Evolution
… Our results show that BenchEvolver can convert saturated benchmarks into frontier-level evaluation suites and reusable training signal . View arXiv page View PDF Project page GitHub 6 Add to collection Community Static benchmarks are rapidly saturating as frontier LLMs improve. …