The latest Gemma 4 models use a training trick to slash their on-device memory footprint
… The Gemma 4 models optimized with QAT are available in five sizes: Gemma 4 E2B, Gemma 4 E4B, Gemma 4 12B, Gemma 4 26B A4B, and Gemma 4 31B. …
The process uses a technique called “Speculative Decoding,” in which the drafter models predict upcoming words in the prompt even before the main Gemma model has read through it. While the drafter moves on to the next sequence of words, the main model verifies the predicted set of words at the same time.
Google's latest trick gets Gemma 4 running 3x faster right on your phone… The Gemma 4 models optimized with QAT are available in five sizes: Gemma 4 E2B, Gemma 4 E4B, Gemma 4 12B, Gemma 4 26B A4B, and Gemma 4 31B. …
… What that means is that Gemma 4 12B can handle multimodal inputs, just like the other Gemma models, but without the added overhead of encoding such inputs. …
… General technology Google's latest trick gets Gemma 4 running 3x faster right on your phone New assistant models share Gemma 4's workload for much less memory. …
… What to expect from Gemma 4 Gemma 4 requires devices running at least Android 12 or iOS 17, and you can now perform tasks that used to require a data connection. …
… Hence, DiffusionGemma is not meant to replace existing Gemini or Gemma models. …
… Still, the exciting news is that Gemma 4 has versions small enough to run on your smartphone. Specifically, Gemma 4 E2B and E4B are distilled down to effective two- and four-billion-parameter footprints. …