Search

Showing top 115 results for "reviews and benchmarks"

Top stories

Discussions and forums

r/LocalLLaMA · u/janvitos · 1w ago

110 tok/s with 12GB VRAM on Qwen3.6 35B A3B and ik_llama.cpp

Had been getting great MTP performance with llama.cpp on my RTX 4070 Super 12GB, until they actually merged the MTP PR. Then, performance tanked and was barely above non-MTP. So, I decided to try out ik_llama.cpp since i…

Hacker News · u/bhu8 · 2w ago

Show HN: Viberia – Civ/Polytopia-like command center for AI agents (BYOK/BYOS)

Hey HN,This is my take on the agent harness. Everything on an isometric map. Agents are grouped into "buildings" that run in a sequence or a loop; e.g., the CodeForge has an agent that writes a PRD, another one that impl…

1
r/LocalLLaMA · u/janvitos · 3w ago

80 tok/sec and 128K context on 12GB VRAM with Qwen3.6 35B A3B and llama.cpp MTP

Just wanted to share my config in hopes of helping other 12GB GPU owners achieve what I see as very respectable token generation speeds with modest VRAM. Using the latest llama.cpp build + MTP PR, I got over 80 tok/sec w…

Hacker News · u/najmuzzaman · Apr 25, 2026

Show HN: A Karpathy-style LLM wiki your agents maintain (Markdown and Git)

I shipped a wiki layer for AI agents that uses markdown + git as the source of truth, with a bleve (BM25) + SQLite index on top. No vector or graph db yet.It runs locally in ~/.wuphf/wiki/ and you can git clone it out if…

260 115
r/LocalLLaMA · u/spencer_kw · 4w ago

DeepSeek V4 being 17x cheaper got me to actually measure what I send to cloud vs what I could run locally. the results are stupid.

That foodtruck bench post showing deepseek v4 matching gpt-5.2 at 17x cheaper got me thinking. if frontier cloud models are that overpriced for equivalent quality, how much of my daily work even needs cloud at all? Ran m…

To show you the most relevant results, we’ve omitted some entries very similar to those already shown. Repeat the search with the omitted results included.