Google Announces TPU v8t Sunfish and TPU v8i Zebrafish
… On-chip SRAM bandwidth is roughly an order of magnitude higher than HBM, so every KV read served from SRAM rather than HBM means shorter per-token latency and higher tokens-per-second at the same power. …