Followed topics

Search

Showing top 1 result for "HBM memory rollout"

Paper page - Memory-Bound but Not Bandwidth-Limited: The Physical AI Inference Gap in Batch-1 LLM Decode

… This workload is usually described as memory-bandwidth-bound . Each decode step streams model weights and the active KV cache , so latency should scale with peak HBM bandwidth . …

Jun 1, 2026