Lecture Series: Neural Scaling Laws
We are honored to present to you the latest installment of the Mehta Family Foundation Lecture Series. We take great pride in curating a series of engaging talks presented by the most influential and accomplished researchers, thinkers, and creators of our time. These lectures are carefully designed to expand upon new and important ideas that are at the forefront of their fields. By participating in our lecture series, we believe that you are helping us to bring greater access to these new and exciting ideas to a larger and more diverse audience. We are confident that you will each of these lectures feeling inspired and intellectually invigorated.
Revisiting the Economics of Large Language Models with Neural Scaling Laws and Dynamic Sparsity
Neural Scaling Law informally states that increased model size and data automatically improve AI. However, we have reached a point where the growth has reached a tipping end where the cost and energy associated with AI are becoming prohibitive.
This talk will demonstrate the algorithmic progress that can exponentially reduce the compute and memory cost of training and inference using “dynamic sparsity” with neural networks.
Dynamic sparsity, unlike static sparsity, aligns with Neural Scaling Laws and does not reduce the power of neural networks while reducing the number of FLOPS required by neural models by 99% or more.
We will show how data structures, particularly randomized hash tables, can be used to design an eﬃcient “associative memory” that reduces the number of multiplications associated with the training of the neural networks. Current implementations of this idea challenge the common knowledge prevailing in the community that specialized processors like GPUs are signiﬁcantly superior to CPUs for training large neural networks.
The resulting algorithm is orders of magnitude cheaper and energy-eﬃcient. Our careful implementations can train billions of parameter recommendations and Language models on commodity desktop CPUs signiﬁcantly faster than top-of-the-line TensorFlow alternatives on the most potent A100 GPU clusters, with the same or better accuracies. We will show some demos, including how to train and ﬁne-tune (with rhﬂ) a billion parameter language model on a laptop from scratch for search, discovery, and summarization.
About the Speaker
Anshumali Shrivastava is an associate professor in the computer science department at Rice University. He is also the Founder and CEO of ThirdAI Corp, a startup focussed on democratizing Mega-AI models through “dynamic sparsity”. His broad research interests include probabilistic algorithms for resource-frugal deep learning. In 2018, Science news named him one of the Top-10 scientists under 40 to watch. He is a recipient of the National Science Foundation CAREER Award, a Young Investigator Award from the Air Force Ofﬁce of Scientiﬁc Research, a machine learning research award from Amazon, and a Data Science Research Award from Adobe. He has won numerous paper awards, including Best Paper Award at NIPS 2014, MLSys 2022, and Most Reproducible Paper Award at SIGMOD 2019. His work on efﬁcient machine learning technologies on CPUs has been covered by popular press including Wall Street Journal, New York Times, TechCrunch, NDTV, Engadget, Ars technica, etc.