Inside the NVIDIA Vera Rubin Platform: Six New Chips, One AI Supercomputer | NVIDIA Technical Blog
…As long-context, multi-turn, and multi-agent workloads push toward millions of tokens, KV cache capacity grows fast, forcing that state into either scarce GPU HBM or durability-optimized enterprise storage…