Paper page - Learning, Fast and Slow: Towards LLMs That Adapt Continually
… This reduced drift also preserves plasticity: after training on one task, FST trained models adapt more effectively to a subsequent task than parameter-only trained models. …