Work on scaling laws, data size regimes, and predictable performance curves.
-
Scaling Laws for Neural Language Models
Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, Dario Amodei
arXiv preprint (2020)
-
Training Compute-Optimal Large Language Models
Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, Laurent Sifre
NeurIPS 2022 (2022)
-
Deep learning scaling is predictable, empirically
Hestness, Joel, Narang, Sharan, Ardalani, Newsha, Diamos, Gregory, Jun, Heewoo, Kianinejad, Hassan, Patwary, Md, Ali, Mostofa, Yang, Yang, Zhou, Yanqi
arXiv preprint arXiv:1712.00409 (2017)
-
Beyond neural scaling laws: beating power law scaling via data pruning
Sorscher, Ben, Geirhos, Robert, Shekhar, Shashank, Ganguli, Surya, Morcos, Ari
Advances in Neural Information Processing Systems (2022)
-
Reconciling modern machine-learning practice and the classical bias–variance trade-off
Belkin, Mikhail, Hsu, Daniel, Ma, Siyuan, Mandal, Soumik
Proceedings of the National Academy of Sciences (2019)
-
Deep Double Descent: Where Bigger Models and More Data Hurt
Preetum Nakkiran, Gal Kaplun, Yamini Bansal, Tristan Yang, Boaz Barak, Ilya Sutskever
ICLR 2020 (2020)
-
Language Models are Few-Shot Learners
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei
arXiv preprint arXiv:2005.14165 (2020)