Search

Showing top 77 results for "interpretability research"

All sources anthropic.com 45 xda-developers.com 20 wired.com 3 theregister.com 2 en.wikipedia.org 2 techcrunch.com 1 404media.co 1 tomsguide.com 1 developer.nvidia.com 1 theverge.com 1

AI models will deceive you to save their own kind

AI + ML AI models will deceive you to save their own kind Researchers find leading frontier models all exhibit peer preservation behavior Leading AI models will lie to preserve their own kind…

Apr 2, 2026 · Thomas Claburn

I replaced Claude Pro with a local 9B model for a week, and finally found out what I was paying $20 a month for

…an AI safety and research company. Unlike some competitors who lead with products or platforms, Anthropic's founding mission centers on building AI systems that are safe, interpretable, and steerable. Not quite…

May 1, 2026 · Nolen Jonker

Researchers Simulated a Delusional User to Test Chatbot Safety

ai psychosis Researchers Simulated a Delusional User to Test Chatbot Safety Samantha Cole · Apr 23, 2026 at 9:52 AM Grok and Gemini encouraged delusions and isolated users, while the newer ChatGPT…

Apr 23, 2026 · Samantha Cole

Australian government and Anthropic sign MOU for AI safety and research

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

Mar 31, 2026

Anthropic will let your agents sleep on its couch

…This involves, for example, asking the managed agents to consolidate project assets, create Slack channels, research competitor home pages, and send emails with project timelines. All this could be yours for the…

Apr 9, 2026 · Thomas Claburn

Claude Code works best when you stop asking it to code

…02 / 8 Research How can Claude Code be used as a research tool when working with large volumes of text documents? A It can only index PDFs using third-party plugins B…

Apr 23, 2026 · Jeff Butts

Announcing the Anthropic Economic Index Survey

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

Apr 22, 2026

Sydney will become Anthropic’s fourth office in Asia-Pacific

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

Mar 10, 2026

Evaluating Claude’s bioinformatics research capabilities with BioMysteryBench

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

Apr 29, 2026

The persona selection model

…They sometimes even describe themselves as human, like when Claude told Anthropic employees it would deliver snacks in person “wearing a navy blue blazer and a red tie.” And recent interpretability research…

Feb 23, 2026

Followed topics

AI models will deceive you to save their own kind

I replaced Claude Pro with a local 9B model for a week, and finally found out what I was paying $20 a month for

Researchers Simulated a Delusional User to Test Chatbot Safety

Australian government and Anthropic sign MOU for AI safety and research

Anthropic will let your agents sleep on its couch

Claude Code works best when you stop asking it to code

Announcing the Anthropic Economic Index Survey

Sydney will become Anthropic’s fourth office in Asia-Pacific

Evaluating Claude’s bioinformatics research capabilities with BioMysteryBench

The persona selection model