Paper page - MLS-Bench: A Holistic and Rigorous Assessment of AI Systems on Building Better AI
…As large language models demonstrate advanced capabilities in reasoning, coding, and engineering tasks, it is increasingly important to understand whether they can discover such methods rather than only apply existing ones. We…