Paper page - Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention
… Overall, we offer a data-centric account of why larger models learn tasks that smaller models fail to. This helps explain why larger models are better in practice, and it can inform practical questions concerning model sizing and training data mixtures . …