Anthropic blames dystopian sci-fi for training AI models to act “evil”
…The problem, the researchers theorize, is that this kind of RLHF safety training couldn’t possibly cover every single type of ethically difficult situation an agentic AI might encounter. When a modern…
