Introducing Claude Opus 4.5
…Claude Opus 4.5 represents a breakthrough in self-improving AI agents . For automation of office tasks, our agents were able to autonomously refine their own capabilities—achieving peak performance in 4…
Pi: Open-Source AI Agent Terminal Set-Up
Deadline Day for Autonomous AI Weapons & Mass Surveillance
Two Rival Bets on AGI: Google I/O Highlights
Claude Mythos: Highlights from 244-page Release
The AI Hardware Podcast S2E6 // Mobileye, NVIDIA, Hailo, Athos, NXP
What the Freakiness of 2025 in AI Tells Us About 2026
…Claude Opus 4.5 represents a breakthrough in self-improving AI agents . For automation of office tasks, our agents were able to autonomously refine their own capabilities—achieving peak performance in 4…
…The problem, the researchers theorize, is that this kind of RLHF safety training couldn’t possibly cover every single type of ethically difficult situation an agentic AI might encounter. When a modern…
NVIDIA ACE is a suite of technologies for building AI agents for gaming. ACE provides ready-to-integrate cloud and on-device AI models for every part of in-game characters, from…
…AVI systems, etc. 🔹 Interaction Dialogue systems, embodied agents, conversational AVI, agentic multimodal systems, and interactive world modeling. ✨ Highlights of this survey: 📚 A unified taxonomy for AVI tasks and paradigms 🧠 Foundations of…
Hi HN, I built Agent OS because I was tired of the "orchestration tax" – writing the same safety checks, memory management, and tool-handling code in every AI agent project. What it does: - Visual policy edit…
I'm a recent grad from UMich and built AgentShield because agentic AI is moving fast but payment safety hasn't caught up. Agents are already being handed API keys, stablecoin wallets, and payment credentials - if one mis…
Hi HN,Last month at a SundAI hackathon, my team built a prototype for an app called iClaw. The goal was to develop an AI agent using Apple Intelligence. I've since continued hacking away at this idea when I had time, and…
Current LLM benchmarks are broken. We think long horizon "world" building could be an interesting additional way to evaluate LLMs, since it combines many aspects such as need for advanced reasoning, tool calling, working…
…Proximity agent - Maps and clusters generated hypotheses to help ensure a diverse, comprehensive exploration of the research space. Debate ideas: Reflection agent - Acts as a "virtual peer reviewer," critically evaluating hypotheses for…
…AI-generated summary Large Language Model (LLM) Red-Teaming, which proactively identifies vulnerabilities of LLMs, is an essential process for ensuring safety. Finding effective and diverse attacks in red-teaming is important…
…accuracy and safety Initial evaluation focuses on incident summary accuracy: how well the model, embedded in a ReAct‑style agent with tools, predicts and executes the correct resolution path for a given…
…This post explains nine techniques for customizing AI agents, along with criteria for selecting the right techniques for your use case. To learn about evaluating AI agents, see Mastering Agentic Techniques: AI…
…Once he sets off one agent to implement some new feature, he tasks another agent to do the preliminary work for the next task he has in mind. In effect, he is…
…Prompt Engineering for Code Generation (2026) A Systematic Approach for Large Language Models Debugging (2026) Who Tests the Testers? Systematic Enumeration and Coverage Audit of LLM Agent Tool Call Safety (2026) Emergent…