r/netsec
· u/Fickle-Box1433
· 3w ago
I evaluated 5 LLM agents on patching real-world CVEs. Here is what I found.
I built an independent benchmark with 20 real CVEs across 15 CWE categories, 5 models (3 OpenAI, 2 Poolside Laguna), three prompt conditions: full advisory, behavioral description only, and location only (file and functi…
Hacker News
· u/imviky
· 3d ago
Ask HN: How do you find out if the LLM API is giving degraded responses?
If you are building on top of multiple LLM APIs or even a single one amongst OpenAI, Claude, Gemini, etc. what do you do when the API starts degrading (slow TTFT, elevated error rates, timeouts). Or even worse, when ther…
Hacker News
· u/zambelli
· May 19, 2026
Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks
Hi HN, I'm Antoine Zambelli, AI Director at Texas Instruments.I built Forge, an open-source reliability layer for self-hosted LLM tool-calling.What it does:- Adds domain-and-tool-agnostic guardrails (retry nudges, step e…