Agents for financial services
…Our investment professionals live in data and analytical models, and Claude for Excel meets them there. Analysts are using it to build and update coverage models, separate signal from noise, and pressure…
…Our investment professionals live in data and analytical models, and Claude for Excel meets them there. Analysts are using it to build and update coverage models, separate signal from noise, and pressure…
Alignment The persona selection model Feb 23, 2026 Read the full post AI assistants like Claude can seem surprisingly human. They express joy after solving tricky coding tasks. They express distress when…
…The evaluations we used are intended to elicit particularly egregious misaligned actions that normal Claude models never engage in. One result is unsurprising: the model learns to reward hack. This is to…
…Output obfuscation attacks prompt models to disguise their outputs in ways that appear harmless if a classifier is only looking at a model’s output. For example, during adversarial testing, attackers successfully…
…update and refine the safeguards after launch. Below we discuss each of Fable 5’s new safeguards in turn. Our wider suite of safeguards is discussed and evaluated in the model’s…
Product Introducing Claude Sonnet 4.6 Feb 17, 2026 Claude Sonnet 4.6 is our most capable Sonnet model yet . It’s a full upgrade of the model’s skills across coding…
…Claude 4 models outperform other frontier models as research agents across financial tasks in Vals AI's Finance Agent benchmark . When deployed by FundamentalLabs to build an Excel agent, Claude Opus 4…
Interpretability Signs of introspection in large language models Oct 29, 2025 Read the paper Have you ever asked an AI model what’s on its mind? Or to explain how it came…
…We measured three Claude models (Opus 4.7, Opus 4.6, Sonnet 4.6) against ChemDraw and MestReNova on 20 compounds drawn from synthetic chemistry preprints published after the models’ training cutoff…
…We tracked how model activations moved along the Assistant Axis throughout each conversation. The pattern was consistent across the models we tested. While coding conversations kept models firmly in Assistant territory throughout…