Paper page - LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling
…Experiments on mathematical reasoning benchmarks show that the discovered strategies improve the overall accuracy--cost tradeoff over strong manually designed baselines. The discovered strategies generalize to held-out benchmarks and model scales…