I replaced cloud LLMs with local models running off a Proxmox LXC, and the performance trade-off was worth it
… Rather, I began using the llama-server functionality to create an LLM server that remains operational 24/7 and hooks up to the rest of my FOSS arsenal thanks to its OpenAI-compatible API. …
