Coding agents in the social sciences
…But only 20% have adopted coding agents—tools like Claude Code that autonomously write and execute analysis code—into their work. There are sharp disparities in use of coding agents. Twice as…
Before we started this research, it was not clear where the misaligned behavior was coming from. Our main two hypotheses were: Our post-training process was accidentally encouraging this behavior with misaligned rewards.This behavior was coming from the pre-trained model and our post-training was failing to sufficiently discourage it. We now believe that (2) is largely responsible. Specifically, at the time of Claude 4’s training, the vast majority of our alignment training was standard chat-based Reinforcement Learning from Human Feedback RLHF data that did not include any agentic tool use. T
Teaching Claude why…But only 20% have adopted coding agents—tools like Claude Code that autonomously write and execute analysis code—into their work. There are sharp disparities in use of coding agents. Twice as…
…We focus on Claude Code usage through a command-line interface (CLI), Claude.ai , or the Claude Code desktop app. 4 By tracking how agentic coding usage changes as models get more…
…Claude 4 models outperform other frontier models as research agents across financial tasks in Vals AI's Finance Agent benchmark . When deployed by FundamentalLabs to build an Excel agent, Claude Opus 4…
…We have updated the model cards for both Claude Opus 4.6 and Claude Sonnet 4.6. For the Opus 4.6 multi-agent configuration described in this report, the run we…
…For this autonomous update, we couldn’t ask Claude to use a physical controller, nor did we evaluate the time it took a researcher to use the Claude-programmed controller to retrieve…
…Box evaluated how Claude Sonnet 4.6 performs when tested on deep reasoning and complex agentic tasks across real enterprise documents. It demonstrated significant improvements, outperforming Claude Sonnet 4.5 in heavy…
…The shopkeeping AI agent—nicknamed “Claudius” for no particular reason other than to distinguish it from more normal uses of Claude—was an instance of Claude Sonnet 3.7, running for a…
Engineering at Anthropic Introducing advanced tool use on the Claude Developer Platform The future of AI agents is one where models work seamlessly across hundreds or thousands of tools. An IDE assistant…
…Claude can make good use of autonomy and tools The HackTheBox competition also demonstrated the agentic capabilities of Claude. Once our researcher started the script late, he went back to moving into…
…Related content Making Claude a chemist Coding agents in the social sciences Results from a survey of 1,260 social scientists about AI and coding agent use. Project Glasswing: An initial update…