Deploying Disaggregated LLM Inference Workloads on Kubernetes | NVIDIA Technical Blog
…Disaggregated serving splits the pipeline into distinct stages such as prefill, decode, and routing, each running as independent services (see Figure 1, below). Aggregated inference In a traditional aggregated setup, a single…