Search

Showing top 25 results for "Competition with Anthropic"

Related topics: Anthropic

People also ask

Why enter Claude into cyber competitions?

AI is poised to transform the domain of cybersecurity. Anthropic’s Safeguards team recently identified and banned a user with limited coding abilities leveraging Claude to develop malware. Research suggests that this lowering of the bar for expertise needed to pose a threat, combined with the falling costs of large language models (LLMs), presages a dramatic shift in the economics of cyberattacks.[1] To understand the present state of AI cyber capabilities and gain insight into their trajectory, we pursue different approaches to model evaluation, including publicly available and custom-made be

Claude does cyber competitions
Why did you have an LLM run a small business?

As AI becomes more integrated into the economy, we need more data to better understand its capabilities and limitations. Initiatives like the Anthropic Economic Index provide insight into how individual interactions between users and AI assistants map to economically-relevant tasks. But the economic utility of models is constrained by their ability to perform work continuously for days or weeks without needing human intervention. The need to evaluate this capability led Andon Labs to develop and publish Vending-Bench, a test of AI capabilities in which LLMs run a simulated vending machine busi

Project Vend: Can Claude run a small shop? (And why does that matter?)