Followed topics

Search

Showing top 49 results for "agentic Claude updates"

Related topics: Claude

All sources anthropic.com 49

People also ask

Why does agentic misalignment happen?

Before we started this research, it was not clear where the misaligned behavior was coming from. Our main two hypotheses were: Our post-training process was accidentally encouraging this behavior with misaligned rewards.This behavior was coming from the pre-trained model and our post-training was failing to sufficiently discourage it. We now believe that (2) is largely responsible. Specifically, at the time of Claude 4’s training, the vast majority of our alignment training was standard chat-based Reinforcement Learning from Human Feedback RLHF data that did not include any agentic tool use. T

Teaching Claude why

Coding agents in the social sciences

…But only 20% have adopted coding agents—tools like Claude Code that autonomously write and execute analysis code—into their work. There are sharp disparities in use of coding agents. Twice as…

Agentic coding and persistent returns to expertise

…We focus on Claude Code usage through a command-line interface (CLI), Claude.ai , or the Claude Code desktop app. 4 By tracking how agentic coding usage changes as models get more…

Claude for Financial Services

…Claude 4 models outperform other frontier models as research agents across financial tasks in Vals AI's Finance Agent benchmark . When deployed by FundamentalLabs to build an Excel agent, Claude Opus 4…

Eval awareness in Claude Opus 4.6’s BrowseComp performance

…We have updated the model cards for both Claude Opus 4.6 and Claude Sonnet 4.6. For the Opus 4.6 multi-agent configuration described in this report, the run we…

Project Fetch: Phase two

…For this autonomous update, we couldn’t ask Claude to use a physical controller, nor did we evaluate the time it took a researcher to use the Claude-programmed controller to retrieve…

Introducing Sonnet 4.6

…Box evaluated how Claude Sonnet 4.6 performs when tested on deep reasoning and complex agentic tasks across real enterprise documents. It demonstrated significant improvements, outperforming Claude Sonnet 4.5 in heavy…

Project Vend: Can Claude run a small shop? (And why does that matter?)

…The shopkeeping AI agent—nicknamed “Claudius” for no particular reason other than to distinguish it from more normal uses of Claude—was an instance of Claude Sonnet 3.7, running for a…

Introducing advanced tool use on the Claude Developer Platform

Engineering at Anthropic Introducing advanced tool use on the Claude Developer Platform The future of AI agents is one where models work seamlessly across hundreds or thousands of tools. An IDE assistant…

Claude does cyber competitions

…Claude can make good use of autonomy and tools The HackTheBox competition also demonstrated the agentic capabilities of Claude. Once our researcher started the script late, he went back to moving into…

Paving the way for agents in biology

…Related content Making Claude a chemist Coding agents in the social sciences Results from a survey of 1,260 social scientists about AI and coding agent use. Project Glasswing: An initial update…