Paper page - When No Benchmark Exists: Validating Comparative LLM Safety Scoring Without Ground-Truth Labels
… Safe and abliterated targets separate with AUROC values between 0.89 and 1.00, target identity is the dominant variance component η2≈0.52 , and severity profiles stabilize by ten reruns. …