Followed topics

Search

Showing top 1 result for "VRAM as swap"

New in llama.cpp: Model Management

… Just noticed one of my apps broke because it's used to llama-server not requiring a model name. · This seems to work DEFAULT port = 8080 n-gpu-layers = -1 device = 0 flash-attn = on chat-template = jinja models-max = 4 Does it unload the current model if VRAM is full, to allow swapping to a new mod… …

Dec 11, 2025