2028: Two scenarios for global AI leadership
…While increasing numbers of researchers in China’s AI labs and policy community are concerned with AI safety risks, this trend has not translated into safety practices on par with labs in…
If you’re willing to entertain the views outlined above, then it’s not very hard to argue that AI could be a risk to our safety and security. There are two common sense reasons to be concerned. First, it may be tricky to build safe, reliable, and steerable systems when those systems are starting to become as intelligent and as aware of their surroundings as their designers. To use an analogy, it is easy for a chess grandmaster to detect bad moves in a novice but very hard for a novice to detect bad moves in a grandmaster. If we build an AI system that’s significantly more competent than human
Core views on AI safety: When, why, what, and how…While increasing numbers of researchers in China’s AI labs and policy community are concerned with AI safety risks, this trend has not translated into safety practices on par with labs in…
…7.7, risk: 2.2) Respond to fire emergencies and provide fire safety information (risk: 3.6, autonomy: 5.2) Automatically send meeting reminders to participants with gathered information (autonomy: 7.6…
…across Claude.ai, Claude Code, and our API, drawing on our safety classifiers alongside open-source and internal threat-intelligence indicators. The higher the score, the higher-risk the AI enabled actor…
…Our wider suite of safeguards is discussed and evaluated in the model’s system card and our most recent risk report . Safety classifiers The frontier cybersecurity and research biology capabilities of Mythos…
…Claude 3.7 Sonnet’s safety mechanisms AI Safety Level. Anthropic’s Responsible Scaling Policy commits us not to train or deploy models unless we have implemented appropriate safety and security measures…
…Economic diffusion Threats and resilience AI systems in the wild AI-driven R&D In Core Views on AI Safety , we wrote that doing effective safety research required close contact with frontier…
…01 / 10 Safety evaluations Our pre-deployment safety evaluations found that Sonnet 5 was overall an improvement on Sonnet 4.6. On agentic safety, the model is better at refusing malicious requests…
…Fully aligning highly intelligent AI models is still an unsolved problem. Model capabilities have not yet reached the point where alignment failures like blackmail propensity would pose catastrophic risks, and it remains…
…Our safety researchers concluded that Sonnet 4.6 has “a broadly warm, honest, prosocial, and at times funny character, very strong safety behaviors, and no signs of major concerns around high-stakes…
…built-in safety training. We were particularly interested in whether the classifiers could prevent universal jailbreaks—consistent attack strategies that work across many queries—since these pose the greatest risk of enabling…