Paper page - Models That Know How Evaluations Are Designed Score Safer
…Evaluating this fine-tuned model on six safety benchmarks , we find that it is significantly safer than the base model and control model. This behavioral shift persists even when restricting the analysis…