Paper page - SoundnessBench: Can Your AI Scientist Really Tell Good Research Ideas from Bad Ones?
Papers arxiv:2605.30329 SoundnessBench: Can Your AI Scientist Really Tell Good Research Ideas from Bad Ones? …
Papers arxiv:2605.30329 SoundnessBench: Can Your AI Scientist Really Tell Good Research Ideas from Bad Ones? …
Papers arxiv:2605.10813 NanoResearch: Co-Evolving Skills, Memory, and Policy for Personalized Research Automation Published on May 11 Submitted by taesiri on May 12 Authors: , , , , , , , , Siyuan Li , , , , , , Cheng Tan Abstract NanoResearch is a multi-agent framework that enhances research autom… …
…In addition, we demonstrate that Intern-Atlas enables downstream applications in idea evaluation and automated idea generation. We position methodological evolution graph s as a foundational data layer for the emerging automated…
…The system combines literature-aware ideation, validated simulation setup generation, automated execution, mesh-independence checking, simulator source-code modification, VLM-based physics verification, reference-data alignment, and figure-grounded manuscript writing within…
…The key idea of Eywa is to augment domain-specific foundation models with a language-model-based reasoning interface, enabling language models to guide inference over non-linguistic data modalities . This design…
…https://github.com/CharlesCNorton/Language-Model-Tools/tree/main/AutoMUD This comment has been hidden (marked as Spam) Former MUD player here, love this idea! Really nice, was sharing it with some…
…Synthesizing Open-Ended Coding Problems at Scale Published on May 14 Submitted by Qiuyang Mang on May 15 Authors: , , , , , , , , , , , , , , , , Abstract FrontierSmith automates the creation of open-ended coding problems from closed-ended…
…Our idea is simple: decouple the population size for FD estimation (e.g., 50k) from the batch size for gradient computation (e.g., 1024). We term this approach FD-loss . Optimizing FD…
… Generated by Qwen/Qwen2.5-Coder-32B-Instruct Scientific figures are among the most effective means of communicating complex research ideas, yet producing publication-quality illustrations remains one of the most labor-intensive parts of paper preparation. …
…The idea of using a harness to generate high-quality training data for delegation intelligence is a clever way to bypass the scarcity of this kind of logic in general text. It…