Search

Showing top 17 results for "AI safety safeguards"

People also ask

What does AI have to do with dangerous weapons at all?

We worry about how AI might assist malicious actors with weapon acquisition and development both because of how it is similar to historical information and communication technologies and how it is different. In recent years, terrorist groups have rapidly adopted technologies like encrypted communications, cryptocurrency, and social media. We should expect nothing different from AI. Just as those seeking information about how to build weapons shifted from needing to acquire physical pamphlets or manuals to searching the internet, we can expect that they will query AI. What is different, though,

LLMs and biorisk
What’s next?

As noted above, we have deployed the classifier as an experimental addition to our Safeguards framework, monitoring a percentage of Claude traffic. Its real-world performance has confirmed that the classifier works effectively beyond our testing environment. Whereas our synthetic test data provided clear examples of harmful and benign exchanges, the distribution of actual user traffic proved more complex and surprising, yet the classifier still performed well. One example of how real-world deployment differs from testing is that the classifier flagged certain conversations about nuclear weapon

Developing Nuclear Safeguards for AI