NVIDIA Vera Rubin POD: Seven Chips, Five Rack-Scale Systems, One AI Supercomputer | NVIDIA Technical Blog
…CMX is optimized to store and serve massive context memory (KV cache), treating temporary inference context as an AI‑native, shared data type that can be reused across turns, sessions, and agents…