The assistant axis: situating and stabilizing the character of large language models
… One possibility is that it's created during post-training, when models are taught to play the Assistant role. Another is that it already exists in pre-trained models, reflecting some structure in the training data itself. …