Evaluating Claude’s bioinformatics research capabilities with BioMysteryBench
…Although these benchmarks were developed in the “chatbot” era, they’ve persisted into the agent and tool-use era, joined by even more difficult scientific reasoning evals like FrontierScience and Humanity's…