Followed topics

Search

Showing top 34 results for "AI safety risks"

All sources anthropic.com 34

People also ask

What safety risks?

If you’re willing to entertain the views outlined above, then it’s not very hard to argue that AI could be a risk to our safety and security. There are two common sense reasons to be concerned. First, it may be tricky to build safe, reliable, and steerable systems when those systems are starting to become as intelligent and as aware of their surroundings as their designers. To use an analogy, it is easy for a chess grandmaster to detect bad moves in a novice but very hard for a novice to detect bad moves in a grandmaster. If we build an AI system that’s significantly more competent than human

Core views on AI safety: When, why, what, and how

2028: Two scenarios for global AI leadership

…While increasing numbers of researchers in China’s AI labs and policy community are concerned with AI safety risks, this trend has not translated into safety practices on par with labs in…

Measuring AI agent autonomy in practice

…7.7, risk: 2.2) Respond to fire emergencies and provide fire safety information (risk: 3.6, autonomy: 5.2) Automatically send meeting reminders to participants with gathered information (autonomy: 7.6…

Mapping AI-enabled cyber threats: Insights from the LLM ATT&CK Navigator

…across Claude.ai, Claude Code, and our API, drawing on our safety classifiers alongside open-source and internal threat-intelligence indicators. The higher the score, the higher-risk the AI enabled actor…

Claude Fable 5 and Claude Mythos 5

…Our wider suite of safeguards is discussed and evaluated in the model’s system card and our most recent risk report . Safety classifiers The frontier cybersecurity and research biology capabilities of Mythos…

Claude's extended thinking

…Claude 3.7 Sonnet’s safety mechanisms AI Safety Level. Anthropic’s Responsible Scaling Policy commits us not to train or deploy models unless we have implemented appropriate safety and security measures…

Focus areas for The Anthropic Institute

…Economic diffusion Threats and resilience AI systems in the wild AI-driven R&D In Core Views on AI Safety , we wrote that doing effective safety research required close contact with frontier…

Introducing Claude Sonnet 5

…01 / 10 Safety evaluations Our pre-deployment safety evaluations found that Sonnet 5 was overall an improvement on Sonnet 4.6. On agentic safety, the model is better at refusing malicious requests…

Teaching Claude why

…Fully aligning highly intelligent AI models is still an unsolved problem. Model capabilities have not yet reached the point where alignment failures like blackmail propensity would pose catastrophic risks, and it remains…

Introducing Sonnet 4.6

…Our safety researchers concluded that Sonnet 4.6 has “a broadly warm, honest, prosocial, and at times funny character, very strong safety behaviors, and no signs of major concerns around high-stakes…

Next-generation Constitutional Classifiers: More efficient protection against universal jailbreaks

…built-in safety training. We were particularly interested in whether the classifiers could prevent universal jailbreaks—consistent attack strategies that work across many queries—since these pose the greatest risk of enabling…