Inside the NVIDIA Vera Rubin Platform: Six New Chips, One AI Supercomputer | NVIDIA Technical Blog
… G3.5 tier: Ethernet-attached flash purpose-built for KV cache, positioned between local tiers HBM, DRAM, local SSD and durable shared storage, so context stays close enough to be reused without paying “G4 latency.” BlueField-4 offload: BlueField-4 runs the KV I/O plane and efficiently terminates NV… …