Search: interpretability research

Sydney will become Anthropic’s fourth office in Asia-Pacific

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

Mar 10, 2026

Evaluating Claude’s bioinformatics research capabilities with BioMysteryBench

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

Apr 29, 2026

The persona selection model

…They sometimes even describe themselves as human, like when Claude told Anthropic employees it would deliver snacks in person “wearing a navy blue blazer and a red tie.” And recent interpretability research…

Feb 23, 2026

Labor market impacts of AI: A new measure and early evidence

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

Mar 5, 2026

A “diff” tool for AI: Finding behavioral differences in new models

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

Mar 13, 2026

Anthropic Sydney office

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

Apr 27, 2026

What 81,000 people told us about the economics of AI

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

Apr 22, 2026

Paving the way for agents in biology

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

Jun 8, 2026

Anthropic forms $200 million partnership with the Gates Foundation

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

May 14, 2026

Emotion concepts and their function in a large language model

…In a new paper from our Interpretability team, we analyzed the internal mechanisms of Claude Sonnet 4.5 and found emotion-related representations that shape its behavior. These correspond to specific patterns…

Apr 2, 2026

Followed topics

Sydney will become Anthropic’s fourth office in Asia-Pacific

Evaluating Claude’s bioinformatics research capabilities with BioMysteryBench

The persona selection model

Labor market impacts of AI: A new measure and early evidence

A “diff” tool for AI: Finding behavioral differences in new models

Anthropic Sydney office

What 81,000 people told us about the economics of AI

Paving the way for agents in biology

Anthropic forms $200 million partnership with the Gates Foundation

Emotion concepts and their function in a large language model