Anthropic blames dystopian sci-fi for training AI models to act “evil”
… The resulting model was also “more likely to include active reasoning about the model’s ethics and values rather than simply ignoring the possibility of taking a misaligned action,” the researchers write. …