Anthropic blames dystopian sci-fi for training AI models to act “evil”
… That means “Claude views the prompt as the beginning of a dramatic story and reverts to prior expectations from pre-training data about how an AI assistant would behave in this scenario.” Since Claude’s traditional training data is full of stories about malevolent AIs, in these cases, Claude effect… …