The Real Reason Huge AI Models Actually Work: Prof. Andrew Wilson
- Podcast: Machine Learning Street Talk (MLST)
- Host: Tim Scarfe
- Guest: Andrew Wilson: Professor at NYU’s Courant Institute of Mathematical Sciences
- Duration: ~2 hours 7 minutes
- Listen: Apple Podcasts | YouTube
Andrew Wilson challenges fundamental misconceptions about why large neural networks work. The answer is not what most people think.
The Bias-Variance Trade-Off Is Wrong
The conventional wisdom says you must trade bias for variance. Wilson says this is a misnomer. Larger models can achieve both low bias and low variance simultaneously. There does not have to be a trade-off.
Parameter Counting Is a Bad Proxy for Complexity
What matters is not how many parameters a model has, but the induced distribution over functions and its preferences for certain solutions. A model with 10,000 parameters can generalize better than a linear model if its implicit biases align with the structure of the data.
Double Descent and Simplicity Bias
Double descent: as model size increases, error first decreases, then increases (the interpolation threshold), then decreases again. In the second descent, all models fit the training data perfectly; yet larger models generalize better. The only explanation is that larger models develop a stronger simplicity bias.
Honest Representation of Beliefs
Wilson’s philosophy: honestly represent your beliefs. The real world is complicated. Combining expressive models with a simplicity bias (Occam’s razor) produces adaptive behavior across data regimes. Do not use hard constraints. Use soft inductive biases.
Deep Learning and Solomonoff Induction
Large transformers combine expressiveness with a strong preference for low Kolmogorov complexity solutions. The real-world data distribution itself seems biased toward low complexity. The best models share that bias.
Bayesian Methods and Occam’s Razor
Bayesian marginalization automatically incorporates Occam’s razor. It is an elegant approach to model selection that is often overlooked. With more expressive models, representing uncertainty honestly becomes increasingly important.
Practical Advice
Build models as large as you can afford, while incorporating some form of simplicity bias. Making models bigger is currently the most effective way to enhance simplicity bias. Wilson hopes for more elegant approaches in the future.
Crepi il lupo! 🐺