The iPhone 17 Pro can run a 400B parameter Large Language Model on-device by streaming weights from the SSD
…Though slow at 0.6 tokens/sec, it proves large models can operate on consumer devices without full memory loading, highlighting SSD speed as the main bottleneck. A new open-source inference…