Paper page - V-GRPO: Online Reinforcement Learning for Denoising Generative Models Is Easier than You Think
…To this end, we introduce Variational GRPO (V-GRPO), a method that integrates ELBO-based surrogates with the Group Relative Policy Optimization (GRPO) algorithm, alongside a set of simple yet essential techniques…