Run High-Throughput Reinforcement Learning Training with End-to-End FP8 Precision | NVIDIA Technical Blog
… Synchronization: The newly calculated scales are then synchronized to the inference engine vLLM for the subsequent rollout phase. This design ensures that the rollout engine always uses optimal quantization scales derived from the latest policy state, minimizing accuracy degradation. …