Paper page - Pushing Biomolecular Utility-Diversity Frontiers with Supergroup Relative Policy Optimization
…For each condition, S GRPO samples a supergroup of candidate sets, compares their diversity under the same condition, and redistributes the group diversity reward to individual rollouts through leave-one-out diversity…
