Most people use Ollama or llama.cpp for local LLMs, but these are the tools I switch to when it gets serious
…A-series GPUs, and Android through OpenCL on Adreno and Mali GPUs. MLC fills a different role from a normal server runtime, even though it can expose OpenAI-compatible APIs. It's…
