Followed topics

Search

Showing top 1 result for "Safety and security updates"

GPT-5.5 dominates $1,500 LLM hacking test while Gemini refuses to even try

… For anyone running security tooling at scale, that gap should make a huge difference. Claude Sonnet 4.6 and Claude Opus 4.8 each solved 2 out of 10 runs, but Opus in particular got close multiple times before safety guardrails ended the session. …

Jun 4, 2026 · Anubhav Sharma