Search

Showing top 6 results for "AI safety and weapons"

People also ask

What does AI have to do with dangerous weapons at all?

We worry about how AI might assist malicious actors with weapon acquisition and development both because of how it is similar to historical information and communication technologies and how it is different. In recent years, terrorist groups have rapidly adopted technologies like encrypted communications, cryptocurrency, and social media. We should expect nothing different from AI. Just as those seeking information about how to build weapons shifted from needing to acquire physical pamphlets or manuals to searching the internet, we can expect that they will query AI. What is different, though,

LLMs and biorisk

Why does this matter?

If an AI system is too cautious, it might refuse legitimate nuclear engineering coursework. Too permissive, and it could inadvertently assist bad actors. Our classifier appears to strike the right balance. In preliminary testing with synthetic data, we achieved a 94.8% detection rate for nuclear weapons queries and zero false positives (overall, 96.2% of the classifier’s labels in this test were accurate as shown in Figure 2), suggesting this system would not flag legitimate educational, medical, or research discussions as concerning. This precision matters because nuclear conversations in AI

Developing Nuclear Safeguards for AI

Is an LLM’s knowledge useful in an applied scenario?

In considering the contribution of AI to biorisk, we need to know more than just how well it performs on a quiz. We need to look at evaluations that involve real people, and closely mirror our actual threat scenarios. Moreover, just as we benchmark AI knowledge by comparing it to experts, we need to measure AI utility by comparing it to the most easily accessible alternative—in this case, the internet. To meet both of these criteria, we have conducted several controlled trials measuring AI’s ability to assist in the planning of a hypothetical bioweapons acquisition process. Participants were g

LLMs and biorisk

Developing Nuclear Safeguards for AI

… One example of how real-world deployment differs from testing is that the classifier flagged certain conversations about nuclear weapons that we ultimately determined to be benign. For example, recent events in the Middle East brought renewed attention to the issue of nuclear weapons. …

Aug 21, 2025

LLMs and biorisk

… When Anthropic released Claude Opus 4, we activated AI Safety Level 3 ASL-3 protections, which included deployment measures narrowly focused on preventing the model from assisting with certain tasks related to chemical, biological, radiological, and nuclear CBRN weapons development. …

Sep 5, 2025

The Long-Term Benefit Trust

… Paul Christiano stepped down in April 2024 to take a new role as the Head of AI Safety at the U.S. AI Safety Institute . In January 2026, Kanika Bahl stepped down to begin a new nonprofit, the AI Access Initiative , and Zach Robinson stepped down to focus on non-profit and philanthropic work. …

Sep 19, 2023

Next-generation Constitutional Classifiers: More efficient protection against universal jailbreaks

Alignment Next-generation Constitutional Classifiers: More efficient protection against universal jailbreaks Jan 9, 2026 Read the paper Large language models remain vulnerable to jailbreaks—techniques that can circumvent safety guardrails and elicit harmful information. …

Jan 9, 2026

Focus areas for The Anthropic Institute

… Our agenda focuses on four areas for research: Economic diffusion Threats and resilience AI systems in the wild AI-driven R&D In Core Views on AI Safety , we wrote that doing effective safety research required close contact with frontier AI systems. …

May 7, 2026

Mapping AI-enabled cyber threats: Insights from the LLM ATT&CK Navigator

… It is calculated based on the actor's activity across Claude.ai, Claude Code, and our API, drawing on our safety classifiers alongside open-source and internal threat-intelligence indicators. The higher the score, the higher-risk the AI enabled actor is. …

Jun 3, 2026

Followed topics

People also ask

Developing Nuclear Safeguards for AI

LLMs and biorisk

The Long-Term Benefit Trust

Next-generation Constitutional Classifiers: More efficient protection against universal jailbreaks

Focus areas for The Anthropic Institute

Mapping AI-enabled cyber threats: Insights from the LLM ATT&CK Navigator