Search

Showing top 129 results for "AI safety safeguards"

People also ask

What’s next?

As noted above, we have deployed the classifier as an experimental addition to our Safeguards framework, monitoring a percentage of Claude traffic. Its real-world performance has confirmed that the classifier works effectively beyond our testing environment. Whereas our synthetic test data provided clear examples of harmful and benign exchanges, the distribution of actual user traffic proved more complex and surprising, yet the classifier still performed well. One example of how real-world deployment differs from testing is that the classifier flagged certain conversations about nuclear weapon

Developing Nuclear Safeguards for AI

Top stories

Discussions and forums