Trustworthy agents in practice
… We tackle this from multiple angles during Claude’s training. First, we construct training scenarios that place Claude in ambiguous situations, and then reinforce Claude’s choice to pause, rather than to assume. …
… We tackle this from multiple angles during Claude’s training. First, we construct training scenarios that place Claude in ambiguous situations, and then reinforce Claude’s choice to pause, rather than to assume. …
… Model developers should consider training models to recognize their own uncertainty. Training models to recognize their own uncertainty and surface issues to humans proactively is an important safety property that complements external safeguards like human approval flows and access restrictions. …
… This raises questions about how the character of an AI system should be shaped: What does it mean for an AI to be good? Which traits and behaviors should it display, and under what circumstances? …
… Agents can then pause for human feedback at checkpoints or when encountering blockers. The task often terminates upon completion, but it’s also common to include stopping conditions such as a maximum number of iterations to maintain control. …