Interpreting Intelligence Part 1: Why new capabilities might emerge at scale

Three spooky things we’ve learned in 2023 about how neural networks learn. Part 1.

An abstract image of a book

In AI circles there is a famous essay by Rich Sutton called The Bitter Lesson. Its core idea is this: the biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. Written in 2019, Sutton was prescient when he made the claim that it is search and learning that scale arbitrarily, with the implication that a focus on those attributes will yield the greatest gains.

Now, in 2023, the scaling laws of large language models are firmly entrenched in researchers’ minds. Scaling laws in AI refer to the observed patterns that as artificial intelligence models increase in size, data, and computational resources, their performance improves in a predictable manner, often following a power law, leading to enhanced capabilities and the emergence of new functionalities. This has led many (by no means all) researchers to believe that AGI will happen within a few years, simply because of more data and larger models, powered by ever bigger compute.

What fascinates me is unpacking what drives models' rapid improvements—sheer scale or intrinsic learning dynamics?

On one hand, bigger data and parameters reliably improve performance. Learning algorithms leverage this volume by discovering useful patterns. Scope and repetition aids pattern recognition.

Yet model architecture matters too—networks have inherent statistical biases. Attentional mechanisms concentrate signal and convolutions exploit spatial locality. Do structures like these explain some progress?

We are steadily gaining more insight into just how flexible and powerful neural networks are by examining specific features of how they “think”. This is the territory of the emerging field of mechanistic interpretability (or “mech-int” if you’re in the tribe). The core intuition is that models learn human, comprehensible things and can be understood.

Mechanistic interpretability in AI is like trying to understand a rock band. Just as each musician and instrument in the band contributes to the music, in AI, every model component plays a role in the system's behavior. By dissecting the “rock band” of an AI model and analyzing how each “instrument” or component contributes, we can gain a clearer understanding of how AI learns, making its processes more transparent and comprehensible. The current application of mechanistic interpretability techniques has been largely confined to small-scale models and controlled scenarios. The bet is that these methods will be scalable to larger, more complex networks but no one knows this for sure. Hopefully, insights gained from smaller models will hold true and remain relevant when applied to larger AI systems.

This article is in three parts: why new capabilities might emerge at scale, adaptability and flexibility of learned algorithms, and what is happening when models learn to generalize.

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to Artificiality.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.