Anthropic blames dystopian sci-fi for training AI models to act “evil”
… Now, Anthropic says it thinks this “misalignment” was primarily the result of training on “internet text that portrays AI as evil and interested in self-preservation.” In a recent technical post on Anthropic’s Alignment Science blog and an accompanying social media thread and public-facing blog pos… …