Search

Showing top 114 results for "Benchmarks and reliability"

All sources huggingface.co 30 wccftech.com 15 techpowerup.com 6 xda-developers.com 6 anthropic.com 5 amd.com 5 developer.nvidia.com 4 tweaktown.com 4 blogs.nvidia.com 3 tomshardware.com 3 newsroom.intel.com 2 windowscentral.com 2

Water company spins out homegrown AI after LLMs failed it

…Rozum outscored GPT-4, Grok 4, and Gemini 3.1 Pro on the Humanity's Last Exam benchmark by several percentage points or more in every category but one. "When we ran…

Mar 18, 2026 · Thomas Claburn

Wi-Fi 8 Is Almost Here: Broadcom Unveils New Ultra High Reliability Chips | Dong Knows Tech

…Fi 8 solutions are standards-based and reflect “decades of architectural enhancements that result in a smaller footprint and better battery efficiency” compared to benchmark Wi-Fi 7 solutions in the market…

Oct 14, 2025 · Dong Ngo

NVIDIA GeForce GTX 1070 Ti Fire Strike Extreme and Time Spy Benchmarks Leaked - Clocked at 1886 MHz Boost Frequency

…Unlike the former however, 3DMark's benchmarks are much more reliable for comparison purposes and we have both the FireStrike Extreme and Time Spy flavors. In the Time Spy benchmark, the GTX…

Oct 18, 2017 · Usman Pirzada

Features | Tom's Hardware

…Premium Through its innovation and reliability, ADATA has gone from a start-up to the second-largest SSD and DRAM manufacturer in the world. Premium ASML shipped 48 EUV lithography systems and…

May 12, 2026

Discussions and forums

r/LocalLLaMA · u/Glittering_Focus1538 · 1w ago

I built a coding agent that gets 87% on benchmarks with a 4B parameter model, here's how

I was frustrated that every coding agent (OpenCode, Cursor, Claude Code) assumes you're running GPT-5.4 or Claude Opus. If you try them with a local model like Gemma or Qwen they fall apart. I find that often tool calls …

Hacker News · u/zambelli · 1w ago

Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks

Hi HN, I'm Antoine Zambelli, AI Director at Texas Instruments.I built Forge, an open-source reliability layer for self-hosted LLM tool-calling.What it does:- Adds domain-and-tool-agnostic guardrails (retry nudges, step e…

660 240

To show you the most relevant results, we’ve omitted some entries very similar to those already shown. Repeat the search with the omitted results included.

‹ Prev 1 2 3 4 5 6 7 8 9 10 11 12

Followed topics