Long-running Claude for scientific computing
Science Long-running Claude for scientific computing Mar 23, 2026 In this post, Siddharth Mishra-Sharma , a researcher on the Discovery team, explains how to apply multi-day agentic coding workflows—test…
Science Long-running Claude for scientific computing Mar 23, 2026 In this post, Siddharth Mishra-Sharma , a researcher on the Discovery team, explains how to apply multi-day agentic coding workflows—test…
…FHIR development and a sample prior authorization review skill. FHIR is the modern standard for exchanging data between healthcare systems, and this skill helps to improve interoperability by enabling developers to connect…
…Although these benchmarks were developed in the “chatbot” era, they’ve persisted into the agent and tool-use era, joined by even more difficult scientific reasoning evals like FrontierScience and Humanity's…
…Through our internal work and with customers at the frontier of agent development, we’ve learned how to design more rigorous and useful evals for agents. Here's what's worked across…
…software development ("Help debug, develop, and optimize software across multiple programming domains") and personal life management ("Assist with personal life management and everyday tasks"). Figure 2.2 shows the primitive profile for…
…Most harmful labor market developments of AI should arguably include a period of increased unemployment, as displaced workers search for alternatives. The Current Population Survey is well suited to tracking this, as…
…Claude Opus 4.5, an AI research assistant developed by Anthropic, performed all calculations including the SCET factorization theorem derivation, one-loop soft and jet function calculations, EVENT2 Monte Carlo simulations, numerical…