Paper page - ProRL: Effective Reinforcement Learning for Proactive Recommendation via Rectified Policy Gradient Estimation
…Cross-evaluator analysis with unseen recommendation models (GRU4Rec, BERT4Rec, LightSANs) confirms the learned guidance strategy generalizes beyond the training environment. Happy to discuss and comment :) 📄 Paper accepted at ICML 2026 💻 Code available…