How we contain Claude across products
… The external content the agent can reach. MCP servers, third-party plugins, and web search tools all feed content into the agent’s context from sources you don’t control. …
… The external content the agent can reach. MCP servers, third-party plugins, and web search tools all feed content into the agent’s context from sources you don’t control. …
… On our Super-Agent benchmark, Claude Opus 4.8 is the only model to complete every case end-to-end, beating prior Opus models and GPT-5.5 at parity on cost. For agent products in translation, deep research, slide-building, and analysis, it delivers powerful reliability. …
… Instead, a false positive costs a single retry where the agent gets a nudge, reconsiders, and usually finds an alternative path. What's next We'll continue expanding the real overeagerness testset and iterating on improving the safety and cost of the feature. …
… Designing for human control In our framework, we outlined the core tension with agents: to be useful, they need to work autonomously, but to keep them secure, humans still need to retain meaningful control over how they work. …
… Combined with its speed, token efficiency, and surprisingly low cost, it’s the first time we’re making Opus available in Notion Agent. …
… Reliability of agents: What aspects of autonomous AI agents could be adapted to fit into existing laws, governance systems, and accountability mechanisms? For example, can we ensure AI agents have a unique identity that they reliably output, even in the absence of direct human control? …
… Agents' autonomy makes them ideal for scaling tasks in trusted environments. The autonomous nature of agents means higher costs, and the potential for compounding errors. …
… While compaction preserves continuity, it doesn't give the agent a clean slate, which means context anxiety can still persist. A reset provides a clean slate, at the cost of the handoff artifact having enough state for the next agent to pick up the work cleanly. …
… We’re also introducing adaptive thinking , where the model can pick up on contextual clues about how much to use its extended thinking, and new effort controls to give developers more control over intelligence, speed, and cost. …
… New agent templates for finance work Each agent template is a reference architecture that packages three things: skills instructions and domain knowledge for the task , connectors governed access to the data the task runs on , and subagents additional Claude models that are called upon by the main … …