The iPhone 17 Pro can run a 400B parameter Large Language Model on-device by streaming weights from the SSD
…The project leverages Apple's " LLM in a Flash " research, in which model weights are streamed on demand directly from the device's NVMe storage rather than preloading the entire 400B parameter…