A “diff” tool for AI: Finding behavioral differences in new models
… This approach to safety is inherently reactive . …
If you’re willing to entertain the views outlined above, then it’s not very hard to argue that AI could be a risk to our safety and security. There are two common sense reasons to be concerned. First, it may be tricky to build safe, reliable, and steerable systems when those systems are starting to become as intelligent and as aware of their surroundings as their designers. To use an analogy, it is easy for a chess grandmaster to detect bad moves in a novice but very hard for a novice to detect bad moves in a grandmaster. If we build an AI system that’s significantly more competent than human
Core views on AI safety: When, why, what, and how… This approach to safety is inherently reactive . …
… The role of frontier models in empirical safety A major reason Anthropic exists as an organization is that we believe it's necessary to do safety research on "frontier" AI systems. This requires an institution which can both work with large models and prioritize safety 5 . …
… Our safety researchers concluded that Sonnet 4.6 has “a broadly warm, honest, prosocial, and at times funny character, very strong safety behaviors, and no signs of major concerns around high-stakes forms of misalignment.” Computer use Almost every organization has software it can’t easily automate… …
… We’re opening a new office in Sydney in the coming weeks, and we’ve signed a Memorandum of Understanding with the Australian government to cooperate on AI safety research and support the goals of Australia’s National AI Plan. …
… A step forward on safety These intelligence gains do not come at the cost of safety. …
… How willing users are to experiment with AI, and whether policymakers create a regulatory context that advances both safety and innovation, will shape how AI transforms economies. …
…Competition aside, benchmarks help us tackle an important question: whether models are capable and reliable enough to support, or even produce, professional-level work. Scientists are using models to write code for…
…We used two patterns to ensure this. Auth can be bundled with a resource or held in a vault outside the sandbox. For Git, we use each repository’s access token to…
…A sketch of caffeine, for example, allows a chemist to spot its resemblance to adenosine, the body’s drowsiness signal, and predict that it keeps us alert by blocking the receptor. However…
…To address hiring directly, we use the panel dimension of the CPS, counting the percent of young (22-25 year old) workers who begin a new job in a more vs. less…
To show you the most relevant results, we’ve omitted some entries very similar to those already shown. Repeat the search with the omitted results included.