768GB of cheap Intel Optane DIMM memory sticks used to run 1-trillion-parameter LLM on a system with a single GPU — local Kimi K2.5 install achieved roughly 4 tokens per second
…The discontinued memory format was designed to bridge the DRAM-SSD divide. While the 768GB of Optane (6x 128GB) does indeed offer far lower latency than the best NVMe SSDs , it is…
