Search

Showing top 113 results for "reviews and benchmarks"

Top stories

Discussions and forums

r/LocalLLaMA · u/janvitos · 1w ago

110 tok/s with 12GB VRAM on Qwen3.6 35B A3B and ik_llama.cpp

Had been getting great MTP performance with llama.cpp on my RTX 4070 Super 12GB, until they actually merged the MTP PR. Then, performance tanked and was barely above non-MTP. So, I decided to try out ik_llama.cpp since i…

Hacker News · u/bhu8 · 2w ago

Show HN: Viberia – Civ/Polytopia-like command center for AI agents (BYOK/BYOS)

Hey HN,This is my take on the agent harness. Everything on an isometric map. Agents are grouped into "buildings" that run in a sequence or a loop; e.g., the CodeForge has an agent that writes a PRD, another one that impl…

1
r/LocalLLaMA · u/janvitos · 3w ago

80 tok/sec and 128K context on 12GB VRAM with Qwen3.6 35B A3B and llama.cpp MTP

Just wanted to share my config in hopes of helping other 12GB GPU owners achieve what I see as very respectable token generation speeds with modest VRAM. Using the latest llama.cpp build + MTP PR, I got over 80 tok/sec w…

Hacker News · u/najmuzzaman · Apr 25, 2026

Show HN: A Karpathy-style LLM wiki your agents maintain (Markdown and Git)

I shipped a wiki layer for AI agents that uses markdown + git as the source of truth, with a bleve (BM25) + SQLite index on top. No vector or graph db yet.It runs locally in ~/.wuphf/wiki/ and you can git clone it out if…

260 115
r/LocalLLaMA · u/spencer_kw · 3w ago

DeepSeek V4 being 17x cheaper got me to actually measure what I send to cloud vs what I could run locally. the results are stupid.

That foodtruck bench post showing deepseek v4 matching gpt-5.2 at 17x cheaper got me thinking. if frontier cloud models are that overpriced for equivalent quality, how much of my daily work even needs cloud at all? Ran m…