GKE Inference Gateway prefix caching accelerates AI inference | Google Cloud Blog
… Source: Principled Technologies GKE 3rd party Managed Kubernetes Service GKE Advantage Mean output token throughput 7,169.21 output tokens per second 6,042.05 output tokens per second 15.7% more output token throughput Mean time to first token TTFT 188.36 ms 2624.73 ms 92.8% less TTFT Mean inter-to… …