The iPhone 17 Pro can run a 400B parameter Large Language Model on-device by streaming weights from the SSD
…After a series of optimizations and a custom metal GPU pipeline written in Objective-C, the project demonstrates that streaming MoE models from consumer-grade SSDs is possible and yields acceptable results…
