Search

Showing top 79 results for "AI safety safeguards" · filtered from 81 indexed

All sources anthropic.com 21 xda-developers.com 13 theverge.com 8 wired.com 8 fudzilla.com 4 techcrunch.com 3 tomsguide.com 3 en.wikipedia.org 3 theregister.com 2 arstechnica.com 2 tomshardware.com 2 androidauthority.com 1

2028: Two scenarios for global AI leadership

…Opportunities for engagement on AI safety Anthropic supports international AI safety dialogue with AI experts in China, when possible. The world has a vested interest in safe AI, regardless of where it…

May 14, 2026

Trustworthy agents in practice

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

Apr 9, 2026

Introducing Claude Opus 4.7

…the risks—and benefits—of AI models for cybersecurity. We stated that we would keep Claude Mythos Preview’s release limited and test new cyber safeguards on less capable models first. Opus…

Apr 16, 2026

Mapping AI-enabled cyber threats: Insights from the LLM ATT&CK Navigator

…It is calculated based on the actor's activity across Claude.ai, Claude Code, and our API, drawing on our safety classifiers alongside open-source and internal threat-intelligence indicators. The higher…

Jun 3, 2026

Discussions and forums

Hacker News · u/sergeysmirnov · 12h ago

Fable 5. Safety Taken to an Extreme

I finally decided to try out Fable 5 using the standard Claude.ai interface. I have a go-to test prompt that answers simple kids' questions in an absurdly scientific style. I asked a pretty straightforward one: "Why don'…

6 4

Anthropic co-founder Chris Olah's remarks on Pope Leo XIV's encyclical "Magnifica humanitas"

…May 25, 2026, Pope Leo XIV released an encyclical on the topic of AI: "Magnifica humanitas: On safeguarding the human person in the time of artificial Intelligence." Anthropic co-founder Chris Olah…

May 25, 2026

Measuring AI agent autonomy in practice

…Training models to recognize and act on their own uncertainty is an important safety property that complements external safeguards like permission systems and human oversight. At Anthropic, we train Claude to ask…

Feb 18, 2026

Encryption, spyware, and now Mythos: History shows why cyber export control doesn't work | TechCrunch

…around Fable 5’s safeguards. Anthropic disputes the “jailbreak” label, calling it a narrow, already-patched issue rather than a wholesale defeat of the model’s safety measures. The result was the…

Jun 19, 2026 · Lorenzo Franceschi-Bicchierai

Claude Mythos can exploit decades-old vulnerabilities, but Anthropic is keeping it locked down

…Related Anthropic just dropped its core AI safety promise, and that should worry you History doesn't repeat itself, but AI companies sure do. Why is Anthropic keeping Mythos under wraps? For…

Apr 16, 2026 · Abhinav Raj

Introducing Claude Opus 4.8

…As we build fiduciary-grade AI systems for legal and tax professionals, advances like these help raise the standard for trusted AI performance in real-world workflows. Claude Opus 4.8 sets…

May 28, 2026

Widening the conversation on frontier AI

…Why we’re doing this Building safe, beneficial AI models requires deep technical work on alignment, interpretability, safeguards, evaluations, and more. But that work isn’t conducted—nor is AI deployed—in…

May 19, 2026

Followed topics