Run High-Throughput Reinforcement Learning Training with End-to-End FP8 Precision | NVIDIA Technical Blog
…FP8 is applied exclusively during generation, while policy model training is conducted in BF16. Final recipe: End-to-end FP8: we use FP8 in both generation and training engines We observe that…
