Paper page - Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs
…research-level-math-capabilities-of-llms-6768-150f563f . how can you robustly separate ill-posedness from policy-driven refusal across models with different safety configurations? Get this paper in your agent: hf…
