Search

Showing top 110 results for "GPU needs for LLMs"

Videos

A Modder Repurposed a Used V100 For LLM Acceleration

… None of this is efficient, as V100 GPUs aren't easy to come by, and most people are unlikely to manage the mod themselves. But it does raise hopes for the mountains of GPUs that have been bought by AI companies. …

May 11, 2026 · Jon Martindale

Your old GPU can still run big LLMs – you just need the right tweaks

… Offloading layers lets me run massive LLMs on weak GPUs That’s how I managed to deploy Qwen3.6-35B-A3B on 12GB of VRAM Although your GPU is the ideal component for providing extra processing oomph to your LLMs, it’s not the only device capable of running them. …

May 6, 2026 · Ayush Pande

Select the right hardware for your local LLM deployment with this online guide - CNX Software

… The website lists common hardware with price, performance tokens/s , power consumption, and more for various LLMs. …

Mar 30, 2026 · Jean-Luc Aufranc (CNXSoft)

Google battles Chinese open weights models with Gemma 4

… Alongside these models are a pair of LLMs optimized for low-end edge hardware like smartphones and single board computers, like the Raspberry Pi. …

Apr 2, 2026 · Tobias Mann

Run OpenClaw Locally On AMD Ryzen™ AI Max+ Processors and Radeon™ GPUs

… Set context to 190000 and make sure GPU Offload is set to MAX. …

Apr 13, 2026

Advancing Emerging Optimizers for Accelerated LLM Training with NVIDIA Megatron | NVIDIA Technical Blog

… Because whole layers vary in size, each GPU needs to collect differently sized parameter updates from different GPUs through all gatherv . …

Apr 22, 2026 · Hao Wu

Linux 7.0 Release, Age Verification Laws, Ryzen 9 9950X3D2 & Other April Happenings

… Given the current re-testing for the imminent Ubuntu 26.04 release, I am still going through all of the benchmarks especially for the multi-GPU scenarios. …

May 1, 2026

Discussions and forums

r/homelab · u/AntifaAustralia · 2w ago

My first 10 inch rack with local LLM! No more Spotify, Google Home, Netflix, ChatGPT...

I'm pretty new to homelabbing and this is my first mini rack! Started with the Beelink ME Mini and then just kinda grew from there (it's always the way hey haha). It idles at 70 watts (not too shabby for how much is goin…

r/LocalLLaMA · u/APFrisco · 1w ago

Computer build using Intel Optane Persistent Memory - Can run 1 trillion parameter model at over 4 tokens/sec

As the title states, my build is indeed able to run a 1 trillion parameter model (in this case Kimi K2.5) locally at ~4 tokens/second. I thought r/LocalLLaMA would be interested in the build due to that stat line, and al…

r/LocalLLaMA · u/janvitos · 2w ago

80 tok/sec and 128K context on 12GB VRAM with Qwen3.6 35B A3B and llama.cpp MTP

Just wanted to share my config in hopes of helping other 12GB GPU owners achieve what I see as very respectable token generation speeds with modest VRAM. Using the latest llama.cpp build + MTP PR, I got over 80 tok/sec w…

r/LocalLLaMA · u/ex-arman68 · 2w ago

2.5x faster inference with Qwen 3.6 27B using MTP - Finally a viable option for local agentic coding - 262k context on 48GB - Fixed chat template - Drop-in OpenAI and Anthropic API endpoints

2026-05-07 edit: I have updated the hardware based recommendations with more focus on quality. I do not recommend q4_0 KV cache anymore beyond 64k context. After multiple rounds of testing with the different size quants,…

r/selfhosted · u/lazycodewiz · 1w ago

services with actually generous free tiers for open-source projects. my list, what would you add?

Been in the weeds shipping an OSS side project for the past few weeks (social media publishing API). Real launch post is coming, this isn't that. Along the way I kept a list of services that actually have usable free tie…

Maximize AI Infrastructure Throughput by Consolidating Underutilized GPU Workloads | NVIDIA Technical Blog

… A pod requests nvidia.com/gpu: 1 , and the scheduler binds it to a physical device. Large language models LLMs like NVIDIA Nemotron, Llama 3, or Qwen 7B/8B require dedicated compute to maintain low time to first token TTFT and high batch throughput. …

Mar 25, 2026 · Sagar Desai

Windows Local AI: AI model deployment using Windows ML on AMD NPU

… Summary of different deployment options for LLMs on AMD NPU Additional Examples Explore more advanced examples in the RyzenAI-SW repository: ResNet: https://github.com/amd/RyzenAI-SW/tree/main/WinML/CNN/ResNet GoogleBERT Transformer: https://github.com/amd/RyzenAI-SW/tree/main/WinML/Transformers/Go… …

Apr 21, 2026 · Dwith Chenna

NVIDIA Brings Up To 5x AI Acceleration To Windows 11 PCs Running RTX 40 & RTX 30 GPUs

… Related Story AMD DGF Tech Offers Massive Increase In Geometry In Ray Traced Games With Future RDNA GPUs, Achieves Up To 30% Compression With Current GPUs TensorRT-LLM Boosts AI For RTX 40 & RTX 30 GPU Owners Today, NVIDIA confirmed that TensorRT-LLM AI acceleration will be available for all RTX De… …

Nov 15, 2023 · Hassan Mujtaba

Followed topics