Search

Showing top 110 results for "GPU needs for LLMs"

Related topics: LLMs

Top stories

Discussions and forums

r/homelab · u/AntifaAustralia · 2w ago

My first 10 inch rack with local LLM! No more Spotify, Google Home, Netflix, ChatGPT...

I'm pretty new to homelabbing and this is my first mini rack! Started with the Beelink ME Mini and then just kinda grew from there (it's always the way hey haha). It idles at 70 watts (not too shabby for how much is goin…

r/LocalLLaMA · u/APFrisco · 1w ago

Computer build using Intel Optane Persistent Memory - Can run 1 trillion parameter model at over 4 tokens/sec

As the title states, my build is indeed able to run a 1 trillion parameter model (in this case Kimi K2.5) locally at ~4 tokens/second. I thought r/LocalLLaMA would be interested in the build due to that stat line, and al…

r/LocalLLaMA · u/janvitos · 2w ago

80 tok/sec and 128K context on 12GB VRAM with Qwen3.6 35B A3B and llama.cpp MTP

Just wanted to share my config in hopes of helping other 12GB GPU owners achieve what I see as very respectable token generation speeds with modest VRAM. Using the latest llama.cpp build + MTP PR, I got over 80 tok/sec w…

r/LocalLLaMA · u/ex-arman68 · 2w ago

2.5x faster inference with Qwen 3.6 27B using MTP - Finally a viable option for local agentic coding - 262k context on 48GB - Fixed chat template - Drop-in OpenAI and Anthropic API endpoints

2026-05-07 edit: I have updated the hardware based recommendations with more focus on quality. I do not recommend q4_0 KV cache anymore beyond 64k context. After multiple rounds of testing with the different size quants,…

r/selfhosted · u/lazycodewiz · 1w ago

services with actually generous free tiers for open-source projects. my list, what would you add?

Been in the weeds shipping an OSS side project for the past few weeks (social media publishing API). Real launch post is coming, this isn't that. Along the way I kept a list of services that actually have usable free tie…