Search

Showing top 110 results for "GPU needs for LLMs"

Videos

LM Studio's frontend was slowing me down, so I switched to this instead

…serves local LLM models via an OpenAI-compatible API that most tools can use. But it's not the right tool for my needs LM Studio was good for dipping my toe…

Apr 22, 2026 · Joe Rice-Jones

You don't need an expensive GPU to run a local LLM that actually works

…The former is the best choice for running LLMs with Nvidia GPUs and tool sets leading the way. But how much do you need to spend on a GPU to comfortably run…

Apr 29, 2026 · Rich Edmonds

GPU passthrough to LXCs beats VMs in Proxmox, and it's way simpler than you'd think

…Once that’s done, you’ll need the device number associated with your GPU components. I’ve got Nvidia graphics cards, and the command for checking the GPU details is ls -l…

Mar 26, 2026 · Ayush Pande

After a year of self-hosting LLMs, I realized the real bottleneck isn’t the GPU

…Quiz 8 Questions · Test Your Knowledge You don't need a beefy GPU to run a local LLM Trivia challenge Think you know your way around local AI? Test your knowledge of…

May 6, 2026 · Yash Patel

Stop obsessing over your GPU's core clock — memory clock matters more for local LLM inference

…Of course, you also need a relatively newer GPU architecture and sufficient memory capacity to avoid stalled LLM workloads. The next time you are optimizing your GPU for LLM inference, focus on…

Mar 28, 2026 · Tanveer Singh

Nvidia's restarting RTX 3060 production, and it's good news for gamers, great news for homelabbers

…entry-level option for home labs 12GB of VRAM gives you the headroom to actually run local LLMs If you've tried running local LLMs on an 8GB GPU like the RTX…

Apr 2, 2026 · Hamlin Rozario

Your GPU does way more than gaming, and it's the reason your PC doesn't feel broken

…Outside of flashy ray tracing demos , our GPUs deliver everyday magic, and they never even ask for credit. Related All the settings in AMD, Intel, and Nvidia's drivers you need to…

May 5, 2026 · Samarveer Singh

Discussions and forums

r/homelab · u/AntifaAustralia · 2w ago

My first 10 inch rack with local LLM! No more Spotify, Google Home, Netflix, ChatGPT...

I'm pretty new to homelabbing and this is my first mini rack! Started with the Beelink ME Mini and then just kinda grew from there (it's always the way hey haha). It idles at 70 watts (not too shabby for how much is goin…

r/LocalLLaMA · u/APFrisco · 2w ago

Computer build using Intel Optane Persistent Memory - Can run 1 trillion parameter model at over 4 tokens/sec

As the title states, my build is indeed able to run a 1 trillion parameter model (in this case Kimi K2.5) locally at ~4 tokens/second. I thought r/LocalLLaMA would be interested in the build due to that stat line, and al…

r/LocalLLaMA · u/janvitos · 2w ago

80 tok/sec and 128K context on 12GB VRAM with Qwen3.6 35B A3B and llama.cpp MTP

Just wanted to share my config in hopes of helping other 12GB GPU owners achieve what I see as very respectable token generation speeds with modest VRAM. Using the latest llama.cpp build + MTP PR, I got over 80 tok/sec w…

r/LocalLLaMA · u/ex-arman68 · 2w ago

2.5x faster inference with Qwen 3.6 27B using MTP - Finally a viable option for local agentic coding - 262k context on 48GB - Fixed chat template - Drop-in OpenAI and Anthropic API endpoints

2026-05-07 edit: I have updated the hardware based recommendations with more focus on quality. I do not recommend q4_0 KV cache anymore beyond 64k context. After multiple rounds of testing with the different size quants,…

r/selfhosted · u/lazycodewiz · 1w ago

‹ Prev 1 2 3 4 5 6 7 8 9 10 11

Followed topics

Search

Videos

LM Studio's frontend was slowing me down, so I switched to this instead

You don't need an expensive GPU to run a local LLM that actually works

GPU passthrough to LXCs beats VMs in Proxmox, and it's way simpler than you'd think

After a year of self-hosting LLMs, I realized the real bottleneck isn’t the GPU

Top stories

I built my own Googlebook with a Raspberry Pi, local LLMs, and old hardware

I added a second GPU just for local AI workloads, and it cost less than upgrading my main one

13 years later, the GTX Titan is still the most important GPU Nvidia ever made

My RTX 5090 can't keep up with Apple Silicon on the biggest local LLMs, and I hate to admit it

Stop obsessing over your GPU's core clock — memory clock matters more for local LLM inference

Nvidia's restarting RTX 3060 production, and it's good news for gamers, great news for homelabbers

Your GPU does way more than gaming, and it's the reason your PC doesn't feel broken

Discussions and forums

My first 10 inch rack with local LLM! No more Spotify, Google Home, Netflix, ChatGPT...

Computer build using Intel Optane Persistent Memory - Can run 1 trillion parameter model at over 4 tokens/sec

80 tok/sec and 128K context on 12GB VRAM with Qwen3.6 35B A3B and llama.cpp MTP

2.5x faster inference with Qwen 3.6 27B using MTP - Finally a viable option for local agentic coding - 262k context on 48GB - Fixed chat template - Drop-in OpenAI and Anthropic API endpoints

services with actually generous free tiers for open-source projects. my list, what would you add?

I thought I needed a GPU for local LLMs until I tried this lean model

Intel Quick Sync is the reason I will never buy another Nvidia card for my Jellyfin server

Intel's $949 GPU has 32GB of VRAM for local AI, but the software is why Nvidia keeps winning