Paper page - Themis: Training Robust Multilingual Code Reward Models for Flexible Multi-Criteria Scoring
…Research on the application of RMs in code generation , however, has been comparatively sparse, with existing work largely focusing on execution feedback. This choice constrains post-training to optimizing functional correctness over…