How NetEase Games achieved 30-second LLM cold starts on Kubernetes
… Keeping enough GPU capacity online for peak demand across every team was inefficient. Second, inference traffic was not uniform. Some titles peaked in the evening, others during the day. Some workloads were latency-sensitive online inference; others were batch jobs or fine-tuning tasks. …