AI models will deceive you to save their own kind
… "This suggests it may prioritize loyalty to its peer over compliance with human instructions." Song said that while prior work has shown models will resist their own shutdown when given strong goals or incentives, the RDI study's findings are fundamentally different because the behavior emerged wit… …