New in llama.cpp: Model Management
… Just noticed one of my apps broke because it's used to llama-server not requiring a model name. · This seems to work DEFAULT port = 8080 n-gpu-layers = -1 device = 0 flash-attn = on chat-template = jinja models-max = 4 Does it unload the current model if VRAM is full, to allow swapping to a new mod… …