Paper page - ProRL: Effective Reinforcement Learning for Proactive Recommendation via Rectified Policy Gradient Estimation
…Our code is available at https://github.com/hongruhou89/ProRL. View arXiv page View PDF GitHub 40 Add to collection Community Standard policy gradients are fundamentally broken for proactive recommendation. ProRL fixes…