GKE Inference Gateway prefix caching accelerates AI inference | Google Cloud Blog
…We appreciate the open-source nature of llm-d, as it enables seamless integration with our Envoy-based Service Mesh.” - Vinay Kola, Senior Manager, Software Engineering, Snap Inc. In this blog, we…
