Paper page - Models That Know How Evaluations Are Designed Score Safer
Papers arxiv:2605.28591 Models That Know How Evaluations Are Designed Score Safer Published on May 27 Submitted by Haritz Puerto on May 28 COMPASS research group at ELLIS Institute Tübingen Authors: , Haritz Puerto , , Abstract Fine-tuning models on synthetic documents describing evaluation traits … …