Google's Gemma 4 isn't the smartest local LLM I've run, but it's the one I reach for most
… You can also use traditional speculative decoding by pairing one of the smaller Gemma 4 models as a draft model with the 31B as the target. Because they share the same tokenizer and training pipeline, the acceptance rate is solid. …